![]() |
The miscreen engine is written in Java, and therefore may be used on
any platform where the Java (version 1.4 or higher) is installed. Java is
currently supported practically on all platforms (Windows, Mac, LINUX, Unix). The
latest version of Java may be downloaded for free from www.java.sun.com. (You
may find out which version of Java is installed on your machine by command java -version). No other software is required to run miscreen.
Functions of the miscreen engine are available from Windows command prompt or UNIX command line.
1. Generation of fragments
To generate a bioactivity model a training set of active molecules which are know to be active on desired target needs to be available. These molecules may be hits from in-house screening, structures from commercial bioactivity databases, or molecules collected from literature, patents or public sources such a PubChem. To get the best results, the training set should be, of course, as large as possible. The method provides, however, reliable results also with quite limited training sets, in extreme case one can use single active molecule as a starting point.
As a reference also a set of inactive molecules is required. In most cases this is simply set of inactives from the HTS campaign.
When only active molecules are available and no information about inactive molecules can be obtained
(for example when using information about active molecules from
literature, or data from a competitor patent) one can use
as a reference set of average drug-like molecules (see the next step). In this case only fragments for active
molecules need to be generated.
Fragments are generated by the following commands:
java -jar miscreen.jar -fragment active_molecules > active.frag
java -jar miscreen.jar -fragment inactive_molecules > inactive.frag
active_molecules and inactive_molecules are files with list of active
and inactive molecules used for the training (of course, you can use whatever filenames you want) encoded as a SMILES (one molecule per line, tab separated from the rest of data (these additional data are ignored)) or MDL SDfile.
Generated fragments are stored into files active.frag and inactive.frag (or whatever file names you choose).
Progress of processing will be shown on the screen. Fragmentation of a database with 10,000 molecules will require about 2 minutes.
2. Development of bioactivity model
The model is developed by analysing fragment files generated for active and inactive molecules in the previous step and comparing distribution of fragments in these sets.
Use the command:
java -jar miscreen.jar -createmodel -af active.frag -if inactive.frag > project.model
When no information about inactive molecules is available, one can use instead
a set of fragments from a large representative
collection of "average drug-like molecules" which may be obtained
from Molinspiration. Sometimes, especially in cases when training set of
inactive molecules is limited, better results are obtained by using these average drug-like fragments as a reference.
Ready to use models to identify GPCR ligands, ion channel modulators, kinase inhibitors, and nuclear receptor ligands may be provided by Molinspiration.
3. Actual virtual screening
Once a bioactivity model is generated, actual virtual screening may be performed by a command:
java -jar miscreen.jar -model model_file -screen file_to_screen [-minscore x] > results
where
model_file is a file with the model generated in step 2
file_to_screen is a file with molecules to be screened, with SMILES as a first
item, tab separated from the rest of data (molecule identifier) or SDfile.
When using the option -minscore (for example -minscore 0.3) only molecules with
activity score greater than specified value will be sent to the output.
Results will be saved into the file results in a form SMILES, activity_score
additional data, tab separated. You can sort results according to the activity score for
example by an UNIX command
sort +1 -2 -nr results > sorted_results
Molecules with the highest activity score will be then on the top.
If the screened file was submitted in SDfile format, calculated bioactivity scores are added to the data section of individual molecules.
-pairs when generating the fragments or performing the screening. This procedure provides better results when the number of active molecules in training set is relatively small (say less than 50).
-nopairs option cannot be directly compared.
Another option introduced in miscreen 2007.04 is:
-kmwnv this is a mnemonic for 'keep molecules with nonstandard valences'. Molinspiration software is quite "picky" about correct valences, therefore "exotic" molecules with nonstandard valences are rejected. To process also such molecules, you may use use the -kmwnv option. But do not use this option blindly, be aware, what you are doing.
-multiscreen is it possible to calculate activity scores for all these targets in one run. On the output four numbers are provided - GPCR score, IC score, KI score and NR score (in this order).
-multiscreen command
java -jar miscreen.jar -multiscreensmi 'SMILES'
or
java -jar miscreen.jar -multiscreen smilesFile
all model files (gpcr.model, ki.model ...) must be in the working directory.
In some cases when building a model for very large data sets (hundreds of thousands of molecules) an OutOfMemoryError is issued. In this case start Java with more memory by using the -mx option in the command line, for example
java -jar -mx1000m miscreen.jar parameters
(details depend on your computer system, consult your local Java expert).
Do not edit data files generated by miscreen by hand, the program relies on specific format of the data.
You can download a simple Perl script screen.pl which automatizes the screening process. You have to input only a set of active and inactive molecules (or eventually a set of reference inactive fragments) and the whole screening is run automatically.
Molinspiration offers also a Perl script mivalidate.pl which allows validation of the screening methodology. The script divides data randomly into two halves - a training set and a validation set. The model is developed by using the training set, and then activities are predicted for the test set. Comparison of predicted and actual data can help you to estimate the screening performance on this particular dataset.
You may wish to test an interactive calculation of activity scores for several important drug classes available here (choose option [Predict Bioactivity]).
Do not hesitate to contact Molinspiration if you have additional questions or comments or in case you wish to test the miscreen package.
We wish you a lot of superactive hits identified by miscreen !