Skip to content

Molecular formula class detection

Sadjad F Baygi edited this page Feb 12, 2023 · 7 revisions

In many instances of chemical analysis, detected compounds belong to a particular chemical class with a unique substructure pattern such as lipids, perfluoroalkyl substances (PFAS), polychlorinated biphenyls (PCBs), polybrominated diphenyl ethers (PBDEs), polycyclic aromatic hydrocarbons (PAHs), phthalates, etc. On the other hand, the molecular formula enumeration method in IDSL.UFA can generate molecular formulas that have repeating substructure patterns. Therefore, to assist in the identification of compounds belonging to these classes, it's recommended to use the detect_formula_sets function from the IDSL.UFA package. This function can detect chemical classes using two key attributes of:

  1. Constant ΔH/ΔC ratios for polymeric (ΔH/ΔC = 2) and cyclic (ΔH/ΔC = 1/2) chain progressions within polymeric and cyclic classes as shown in Table S.2 - S.4.

  2. Constant number of carbons and fixed sum of hydrogens and halogens (Σ(H+Br+Cl+F+I)) which represents classes similar to PCBs and PBDEs as shown in Table S.5.

The detect_formula_sets function aggregate a vector of mixed molecular formulas based on their classes to facilitate identifying similar molecular formulas. For example, a vector of mixed molecular formulas can be obtained from an aligning annotated molecular formula table to detect related molecular formulas across a study. Likewise, this approach was used to detect presence of chlorinated perfluorotriether alcohols (Cl-PFTrEAs) in human specimens from the ST001430 study.

detect_formula_sets(molecular_formulas, ratio_delta_HBrClFI_C, mixed.HBrClFI.allowed,
		    min_molecular_formula_class, max_number_formula_class, number_processing_threads = 1)

molecular_formulas: a vector of molecular formulas

ratio_delta_HBrClFI_C: c(2, 1/2, 0). 2 to detect structures with linear carbon chains such as PFAS, lipids, chlorinated paraffins, etc. 1/2 to detect structures with cyclic chains such as PAHs. 0 to detect molecular formulas with fixed structures but changing H/Br/Cl/F/I atoms similar to PCBs, PBDEs, etc.

mixed.HBrClFI.allowed: c(TRUE, FALSE). Select FALSE to detect halogenated-saturated compounds similar to PFOS or select TRUE to detect mixed halogenated compounds with hydrogen.

min_molecular_formula_class: minimum number of molecular formulas in each class. This number should be greater than or equal to 2.

max_number_formula_class: maximum number of molecular formulas in each class

number_processing_threads: Number of processing threads for multi-threaded computations

## Example
library(IDSL.UFA)
## A vector of mixed molecular formulas
molecular_formulas <- c("C3F7O3S", "C4F9O3S", "C5F11O3S", "C6F9O3S", "C8F17O3S",
"C9F19O3S", "C10F21O3S", "C7ClF14O4", "C10ClF20O4", "C11ClF22O4", "C11Cl2F21O4",
"C12ClF24O4")
##
ratio_delta_HBrClFI_C <- 2 # to aggregate polymeric classes
mixed.HBrClFI.allowed <- FALSE # To detect only halogen saturated classes
min_molecular_formula_class <- 2
max_number_formula_class <- 20
##
classes <- detect_formula_sets(molecular_formulas, ratio_delta_HBrClFI_C, mixed.HBrClFI.allowed,
min_molecular_formula_class, max_number_formula_class, number_processing_threads = 1)