Shape-based virtual screening with volumetric aligned molecular shapes - PubMed Skip to main page content
U.S. flag

An official website of the United States government

Dot gov

The .gov means it’s official.
Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

Https

The site is secure.
The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

Access keys NCBI Homepage MyNCBI Homepage Main Content Main Navigation
. 2014 Sep 30;35(25):1824-34.
doi: 10.1002/jcc.23690. Epub 2014 Jul 22.

Shape-based virtual screening with volumetric aligned molecular shapes

Affiliations

Shape-based virtual screening with volumetric aligned molecular shapes

David Ryan Koes et al. J Comput Chem. .

Abstract

Shape-based virtual screening is an established and effective method for identifying small molecules that are similar in shape and function to a reference ligand. We describe a new method of shape-based virtual screening, volumetric aligned molecular shapes (VAMS). VAMS uses efficient data structures to encode and search molecular shapes. We demonstrate that VAMS is an effective method for shape-based virtual screening and that it can be successfully used as a prefilter to accelerate more computationally demanding search algorithms. Unique to VAMS is a novel minimum/maximum shape constraint query for precisely specifying the desired molecular shape. Shape constraint searches in VAMS are particularly efficient and millions of shapes can be searched in a fraction of a second. We compare the performance of VAMS with two other shape-based virtual screening algorithms a benchmark of 102 protein targets consisting of more than 32 million molecular shapes and find that VAMS provides a competitive trade-off between run-time performance and virtual screening performance.

Keywords: GSS tree; molecular shape; shape constraints; shape indexing; virtual screening.

PubMed Disclaimer

Figures

Figure 1
Figure 1
The reference ligand and receptor for the AmpC beta-lactamase target from the DUD-E benchmark. (a) The ligand (green) and receptor (blue) shown with molecular surfaces and (b) a cutaway of the voxelization of these molecular shapes at a 0.5Å resolution. Images generated with PyMOL and Sproxel.
Figure 2
Figure 2
The AmpC reference ligand (yellow) shown with minimum (solid green) and maximum (mesh green) volumetric shape constraints defined only by the ligand. The shape constraints were created by shrinking/growing the ligand volume by a gap distance of 2Å.
Figure 3
Figure 3
The AmpC reference ligand (yellow sticks) shown with a minimum shape constraint (green) derived from the ligand and the inverse of a maximum shape constraint (blue) derived from the reference receptor. Shape constraints are created by shrinking the volume of the ligand/receptor by a specific gap distance. Shape constraints are shown for gap distances of (a) 1 Å, (b) 1.5 Å, and (c) 2.0 Å. Together, these shape constraint define a query that selects molecular shapes that fully contain the green volume and do not overlap the blue volume. Images generated with VMD.
Figure 4
Figure 4
An illustration of a GSS-tree node with two leaves. The union of the molecular shapes in the leaves forms the Maximum Surrounding Volume (MSV), while their intersection forms the Minimum Included Volume (MIV).
Figure 5
Figure 5
The distribution of the area under the curve (AUC) of the receiver operating characteristic (ROC) curves of various shape-based virtual screening algorithms when applied to all 102 targets in the DUD-E benchmark. An AUC of 0.5 indicates random performance while an AUC of 1.0 indicates a perfect ranking of active ligands. Violin plots show the median value (dot), the range between the first and third quartile (solid block line), and the kernel density from the minimum to maximum values (shaded area).
Figure 6
Figure 6
The average time spent per one million shapes for various methods of shape comparison. Values are averaged across the 102 targets of the DUD-E benchmark. Times are plotted on a log scale, and there is an almost two orders of magnitude difference between the fastest and slowest methods.
Figure 7
Figure 7
Distribution of retrieval rates of the top optimized ROCS (a) virtual hits and (b) actives if faster methods of shape comparison are first used to produce a smaller library. Only the top 0.1%, 1%, and 10% of hits as ranked by USR and VAMS are screened with optimized ROCS. The top 0.1% of hits identified by this hybrid method are compared to the top 0.1% of hits identified by a full, more time-consuming, optimized ROCS screen. The fraction of (a) molecular conformations and (b) active compounds (regardless of conformation) identified by the hybrid screen in this top 0.1% set that are identical to those ranked in the top 0.1% by the full screen is shown. A value of one indicates that the hybrid approach identifies an identical set of top hits to a full screen while a value of zero means that the hybrid approach identified none of the top hits from a full optimized ROCS screen. When measuring retrieval of the conformations of active compounds in this top 0.1%, five benchmarks are omitted since optimized ROCS did not rank any actives this highly. VAMS generally needs to select a set a tenth the size as the USR method to produce equivalent enriched subsets for ROCS screening. Violin plots show the median value (dot), the range between the first and third quartile (solid block line), and the kernel density from the minimum to maximum values (shaded area).
Figure 8
Figure 8
The enrichment factor and true positive rate (sensitivity) for various VAMS similarity threshold searches across the 102 targets of the DUD-E benchmark. Both the (a) full results and (b) a magnification of the lower enrichment factor region are shown. Solid marks correspond to the similarity threshold search that had the highest F1 score for a given target. Enrichment factors greater than one indicate better than random performance.
Figure 9
Figure 9
The average time spent per one million shapes when using different methods for searching volumetric shapes across the DUD-E targets. Although in theory indexing approaches can speed up k-nearest neighbor and similarity threshold (t) searches, in practice only the narrowest of queries provide a performance improvement over simple linear scan.
Figure 10
Figure 10
The enrichment factor and true positive rate (sensitivity) for various shape constraint searches across the 102 targets of the DUD-E benchmark. Both the (a) full results and (b) a magnification of the lower enrichment factor region are shown. All possible shape constraints using just the ligand (see Figure 2) and both the ligand and receptor shapes (see Figure 3) with gap sizes of 1, 1.5, and 2 Å for the minimum and maximum constraints were considered. Solid marks correspond to the shape constraint search that had the highest F1 score for a given target. Enrichment factors greater than one indicate better than random performance.
Figure 11
Figure 11
The average time spent per one million shapes when using shape constraints to search the DUD-E benchmark using linear scan, where every ligand is evaluated, and an indexing method, where a search tree is used to limit the search.
Figure 12
Figure 12
Performance scaling of shape constraint search. Total search time is shown relative to the size of the shape database for each of the 102 DUD-E targets. (a–c) Ligand only shape constraints and (d–f) ligand-receptor shape constraints for a variety of gap sizes are shown. Solid marks indicate cases where there was at least one match to the shape constraint query. Empirically, highly specific queries scale sub-linearly with hitless queries demonstrating nearly constant performance with respect to database size.
Figure 13
Figure 13
The distribution of the area under the curve (AUC) of various shape-based virtual screening algorithms with and without pharmacophoric color information when applied to all 102 targets in the DUD-E benchmark. Violin plots show the median value (dot), the range between the first and third quartile (solid block line), and the kernel density from the minimum to maximum values (shaded area).
Figure 14
Figure 14
Distribution of retrieval rates of the top color-optimized ROCS (a) hits and (b) actives if faster methods of shape comparison are first used to produce a smaller library. The ability of these methods to retrieve the identical molecular conformations ranked in the top 0.1% for each benchmark is measured. When measuring retrieval of the conformations of active compounds in this top 0.1%, one benchmark is omitted since color-optimized ROCS did not rank any actives this highly. Despite lacking color information, VAMS produces better retrieval rates than USR with color information. Violin plots show the median value (dot), the range between the first and third quartile (solid block line), and the kernel density from the minimum to maximum values (shaded area).

Similar articles

Cited by

References

    1. Nicholls A, McGaughey GB, Sheridan RP, Good AC, Warren G, Mathieu M, Muchmore SW, Brown SP, Grant JA, Haigh JA. J Med Chem. 2010;53:3862. doi: 10.1021/jm900818s. - DOI - PMC - PubMed
    1. Rush TS, III, Grant JA, Mosyak L, Nicholls A. J Med Chem. 2005;48:1489. doi: 10.1021/jm040163o. - DOI - PubMed
    1. McMasters DR, Garcia-Calvo M, Maiorov V, McCann ME, Meurer RD, Bull HG, Lisnock JM, Howell KL, DeVita RJ. Bioorganic & medicinal chemistry letters. 2009;19:2965. doi: 10.1016/j.bmcl.2009.04.031. - DOI - PubMed
    1. Muchmore SW, Souers AJ, Akritopoulou-Zanze I. Chemical biology & drug design. 2006;67:174. doi: 10.1111/j.1747-0285.2006.00341.x. - DOI - PubMed
    1. Ballester PJ, Westwood I, Laurieri N, Sim E, Richards WG. Journal of The Royal Society Interface. 2010;7:335. doi: 10.1098/rsif.2009.0170. - DOI - PMC - PubMed

Publication types

LinkOut - more resources