Chemical Descriptors

Similar to chemists, models provide better predictions if they can understand similarities and differences between chemical compounds. The best way to to give models chemical intuition is using chemical descriptors.

Chemical descriptors are vectors (collections of numbers) describing properties of individual compounds. For example, relevant properties of solvents include polarity, or the ability to donate and receive hydrogen bonds.

Ultimately, descriptors let the model navigate the reaction space in a systematic and informed way.


Generating Descriptors from SMILES

The simplest way to generate descriptors is using the Generate Descriptors button on the Reaction Parameters page.


You will be prompted to select which parameter you would like to generate descriptors for.


After selecting the parameter, you will need to copy in SMILES strings for all the possible compounds.


SMILES (Simplified Molecular-Input Line-Entry System) are a great way to represent organic molecules using simple text.

Most chemical drawing software (ChemDraw, Marvin Sketch, Biovia Draw) has functionality to generate SMILES from chemical structure. You can also find SMILES in most online chemical databases, as well as in Wikipedia articles.

You can learn more about SMILES on Wikipedia.

  1. Name of the parameter

    This column contains names of allowed parameters.

    If you edit, add or remove values, make sure the parameters are consistent with the Values column on the Reaction Parameters page.


    Paste the SMILES strings into this column. The SMILES strings will be used to generate chemical descriptors for the compounds.

  3. Delete Descriptors

    This button deletes this table and associated descriptors.

Importing Chemical Descriptors

If you have specific insights into the reaction mechanism and how it’s affected by the properties of the compounds, you can design and import your own descriptors.

The descriptors need to be stored in a .csv file with the following format:
  • entries are separated by commas

  • first line contains heading, the names of the columns are not used by Yoneda Optimize

  • first column contains names of the compounds, these need to match the Values on the Reaction Parameters page

  • subsequent columns contain descriptors, i.e. numeric entries