Analysis and use of fragment-occurrence data in similarity-based virtual screening

Shereena M. Arif, John D. Holliday, Peter Willett

Research output: Contribution to journalArticle

25 Citations (Scopus)

Abstract

Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.

Original languageEnglish
Pages (from-to)655-668
Number of pages14
JournalJournal of Computer-Aided Molecular Design
Volume23
Issue number9
DOIs
Publication statusPublished - 2009
Externally publishedYes

Fingerprint

Dermatoglyphics
Screening
screening
Chemical Databases
fragments
occurrences
Molecules
Holograms
Bioactivity
sunset
Pharmaceutical Preparations
substructures
Pipelines
molecules
coding
drugs
coefficients
Experiments
interactions

Keywords

  • Fingerprint
  • Fragment occurrences
  • Ligand-based virtual screening
  • Similarity searching
  • Substructural fragment
  • Tanimoto coefficient
  • Virtual screening
  • Weighting scheme

ASJC Scopus subject areas

  • Drug Discovery
  • Physical and Theoretical Chemistry
  • Computer Science Applications

Cite this

Analysis and use of fragment-occurrence data in similarity-based virtual screening. / Arif, Shereena M.; Holliday, John D.; Willett, Peter.

In: Journal of Computer-Aided Molecular Design, Vol. 23, No. 9, 2009, p. 655-668.

Research output: Contribution to journalArticle

Arif, Shereena M. ; Holliday, John D. ; Willett, Peter. / Analysis and use of fragment-occurrence data in similarity-based virtual screening. In: Journal of Computer-Aided Molecular Design. 2009 ; Vol. 23, No. 9. pp. 655-668.
@article{b222d98e2ab74bedaa441f9c6eb22594,
title = "Analysis and use of fragment-occurrence data in similarity-based virtual screening",
abstract = "Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.",
keywords = "Fingerprint, Fragment occurrences, Ligand-based virtual screening, Similarity searching, Substructural fragment, Tanimoto coefficient, Virtual screening, Weighting scheme",
author = "Arif, {Shereena M.} and Holliday, {John D.} and Peter Willett",
year = "2009",
doi = "10.1007/s10822-009-9285-0",
language = "English",
volume = "23",
pages = "655--668",
journal = "Journal of Computer-Aided Molecular Design",
issn = "0920-654X",
publisher = "Springer Netherlands",
number = "9",

}

TY - JOUR

T1 - Analysis and use of fragment-occurrence data in similarity-based virtual screening

AU - Arif, Shereena M.

AU - Holliday, John D.

AU - Willett, Peter

PY - 2009

Y1 - 2009

N2 - Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.

AB - Current systems for similarity-based virtual screening use similarity measures in which all the fragments in a fingerprint contribute equally to the calculation of structural similarity. This paper discusses the weighting of fragments on the basis of their frequencies of occurrence in molecules. Extensive experiments with sets of active molecules from the MDL Drug Data Report and the World of Molecular Bioactivity databases, using fingerprints encoding Tripos holograms, Pipeline Pilot ECFC_4 circular substructures and Sunset Molecular keys, demonstrate clearly that frequency-based screening is generally more effective than conventional, unweighted screening. The results suggest that standardising the raw occurrence frequencies by taking the square root of the frequencies will maximise the effectiveness of virtual screening. An upper-bound analysis shows the complex interactions that can take place between representations, weighting schemes and similarity coefficients when similarity measures are computed, and provides a rationalisation of the relative performance of the various weighting schemes.

KW - Fingerprint

KW - Fragment occurrences

KW - Ligand-based virtual screening

KW - Similarity searching

KW - Substructural fragment

KW - Tanimoto coefficient

KW - Virtual screening

KW - Weighting scheme

UR - http://www.scopus.com/inward/record.url?scp=69249211345&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=69249211345&partnerID=8YFLogxK

U2 - 10.1007/s10822-009-9285-0

DO - 10.1007/s10822-009-9285-0

M3 - Article

C2 - 19536456

AN - SCOPUS:69249211345

VL - 23

SP - 655

EP - 668

JO - Journal of Computer-Aided Molecular Design

JF - Journal of Computer-Aided Molecular Design

SN - 0920-654X

IS - 9

ER -