Skip to content

Isotopic Profile DataBase (IPDB)

Sadjad F Baygi edited this page Jan 16, 2023 · 18 revisions

NOTE: IDSL.UFA v1.8 requires new IPDBs with the new structure

The annotation step in the IDSL.UFA software package depends on pre-calculated Isotopic Profile DataBases (IPDB) to efficiently annotate chromatographic peaks with molecular formulas. IPDB libraries can be saved and re-used in similar workflows which is also necessary for consistency in population-size studies. Generally, IPDB objects are R lists consisting of eight primary objects including:

logIPDB: Parameters used to create the IPDB object

AggregatedList: A list of rounded mass and IDs

MassMAIso: A vector of mass of the most abundant isotopologues

MolecularFormula: A vector of molecular formula ions

IsotopicProfile: A list of theoretical isotopic profiles

R13C: A vector of theoretical R13C values

IndexMAIso: A vector of indices of the most abundant isotopologues in the isotopic profiles

IPsize: A vector of number of isotopologues in the isotopic profiles

Retention Time: An optional feature to include retention times of the molecular formulas to annotate using a retention time window

Two approaches embedded in the IDSL.UFA workflow to generate IPDBs:

IPDBs from Enumerated Chemical Spaces

In many instances, a chemical space for an analysis can be predicted with sample preparation and instrumental methods. When boundaries of a chemical space is known, the chemical space can be generated using the enumerated_chemical_space tab in the UFA parameter spreadsheet to detect unknown molecular formulas. The vast enumerated chemical space can be optimized with five intelligent molecular formula prioritization rules and additional user-defined conditional rules. An IPDB can cover up to 108 molecular formulas from a chemical space. Prior to performing a complete chemical space enumeration, IDSL.UFA attempts to measure required time for iteration loops to prevent memory overflow.

Follow these steps to generate an IPDB from a list of known molecular formulas

  1. Select the chemical space boundaries and the criteria in the enumerated_chemical_space tab in the UFA parameter spreadsheet

  2. Select YES for PARAM0001 and PARAM0002 in the parameters tab in the UFA parameter spreadsheet

  3. Run this command in R or Rstudio console or terminal: IDSL.UFA::UFA_workflow("Address of the UFA parameter spreadsheet")

IPDBs from Formula Sources

DOI

IPDBs can be generated using the formula_source tab in the UFA parameter spreadsheet when a number of suspect molecular formulas are avialable. This IPDB generation approach allows including Retention Time values for a narrower screening using a retention time tolerance in addition to isotopic profile screening. Additioanlly, we generated IPDBs consistent with IDSL.UFA >= 1.8 for molecular formulas of the following databases.

  1. Blood exposome
  2. EPA CompTox chemicals dashboard
  3. FDA substance registry
  4. IDSL.Exposome
  5. LIPID MAPS
  6. RefMet
  7. PubChem databases

These IPDB libraries can be accessed using this link for positive and negative modes. These IPDB libraries were generated presuming occurance of c("[M+H]+", "[M+Na]+", "[M-H2O+H]+") and c("[M-H]-", "[M-H2O-H]-") ionization pathways in positive and negative modes, respectively. Therefore, numbers of molecular formula ions in IPDBs are approximately a factor of the number of ionization pathways multiplied by the number of intact molecular formulas. Non-carbon-containing compounds are excluded from IPDBs since IDSL.IPA cannot detect non-carbon-containing compounds in the first place.

Follow these steps to generate an IPDB from a list of known molecular formulas

  1. Prepare the list of molecular formulas in a file with .csv/.xlsx/.txt format in one column. The .csv/.xlsx files may have a second column for retention time values in minutes to match peaks using retention times as well. Do not use headers for the .csv/.xlsx files.

  2. Select the parameters in the formula_source tab in the UFA parameter spreadsheet

  3. Select YES for PARAM0001 and PARAM0003 in the parameters tab in the UFA parameter spreadsheet

  4. Run this command in R or Rstudio console or terminal: IDSL.UFA::UFA_workflow("Address of the UFA parameter spreadsheet")

There are more databases to extract molecular formulas and create your own IPDBs based on your analyses' needs. For example, We recommend following known sources of molecular formula for human specimens.

  1. MeSH Database: NLM - MeSH ontology and linked compunds in the PubChem database.

  2. PubChem CID-PMID