miscreen - Molinspiration Virtual Screening Engine v2026.02

Molinspiration miscreen engine enables rapid prediction of biological activity through virtual screening of large molecular collections and selection of molecules with the highest probability of exhibiting biological activity. The screening is based on identifying fragments or substructure features characteristic of active molecules. No information about the receptor's 3D structure is required; a set of active molecules (encoded as SMILES) is sufficient for training. Therefore, the procedure can also be applied during the early stages of a project when detailed information about the binding mode is not yet available. Molinspiration virtual screening is highly efficient (100,000 molecules can be screened in approximately 5 minutes), enabling processing of very large molecular libraries. Validation studies performed by our company, as well as results reported by our customers across various target classes (including GPCR targets, nuclear receptors, enzymes, pesticides, and others), demonstrate a 10- to 20-fold increase in hit rates compared with random selection of molecules for screening. Another advantage of the Molinspiration screening procedure is its ability to identify novel active scaffolds that are not present in the training set (so-called "scaffold hopping"), which cannot be identified by conventional similarity searching. See here for more details about the Molinspiration virtual screening protocol.

The miscreen engine is written in Java and therefore can be used on any platform where Java runtime (version 1.13 or higher) is installed. Java is currently supported on virtually all major platforms (Windows, Mac, Linux, Unix). The latest version of Java runtime may be downloaded free of charge from various providers. (You can determine which version of Java is installed on your system using the command java -version). No additional software or special installation is required to run miscreen.

Functions of the miscreen engine are available from the Windows command prompt or UNIX command line.

Using the miscreen

To run a virtual screening, a screening model first needs to be created. For this purpose, a training set of active molecules known to be active against the target of interest is required. These molecules may include hits from in-house screening campaigns, structures from commercial bioactivity databases, or molecules collected from literature, patents, or public resources such as PubChem or ChEMBL. To achieve the best results, the training set should ideally be as large and diverse as possible. However, the method can still provide reliable results even with relatively small training sets. In extreme cases, a single active molecule may be used as a starting point.

A set of inactive molecules is also required as a reference. In most cases, this consists simply of inactive compounds from an HTS campaign. When only active molecules are available and no information about inactive molecules can be obtained (for example, when using information from literature sources or competitor patents), a "background" set of representative drug-like molecules may be used as a reference.

The model is created by a command:

java -jar miscreen.jar -act active_molecules -ina inactive_molecules > model_file

active_molecules and inactive_molecules are files containing lists of active and inactive molecules used to train the model (of course, any filenames may be used). Molecules are encoded as SMILES, with one molecule per line, separated by tabs from any additional information (such additional data are ignored).

The generated model is stored in the file "model" (or any filename you choose).

Actual virtual screening

Once a bioactivity model has been generated, the actual virtual screening can be performed using the command:

java -jar miscreen.jar -model model_file -screen file_to_screen [-minscore x] > results

where

model_file is the model file generated in the first step

file_to_screen is a file containing molecules to be screened, with SMILES as the first item followed by additional tab-separated data (such as a molecule identifier)

When using the option -minscore (for example -minscore 5.), only molecules with activity scores greater than the specified value will be sent to the output.

Results will be saved in the file results in the form of SMILES, original input data, and the calculated activity score, separated by tabs.

This procedure does not use cross-validation during model creation and validation. To obtain a more reliable assessment of model performance, we strongly recommend using cross-validation during the model building process. In our internal projects, Molinspiration typically uses an average of 10 cross-validation runs, with 80% of the data used for training and 20% for validation. The complete procedure can be easily implemented in Python by calling miscreen.jar and using sklearn's train_test_split module to prepare training and test sets and sklearn.metrics roc_curve to evaluate performance. Molinspiration will be glad to assist with development of a tailored model-building protocol.

Additional hints

In some cases, when building models for very large datasets (hundreds of thousands of molecules), an OutOfMemoryError may occur. In such cases, start Java with additional memory using the -mx option in the command line, for example:

java -jar -mx1000m miscreen.jar parameters

(details depend on your computer system; consult your local Java expert if necessary).

Do not manually edit data files generated by miscreen, as the program relies on a specific data format.

Do not hesitate to contact Molinspiration if you have additional questions, comments, or if you wish to evaluate the miscreen package.