Skip to content

01_Baseline

Michael Bornholdt edited this page Sep 8, 2021 · 5 revisions

The aim of this first part is to decide on useful evaluation metrics and thus create a baseline of how well the class pipeline performs. Classical feature extraction is done by employing CellProfiler and following the postprocessing steps found in the LINCS repository.

Content

  • LINCS data
  • Images & location files
  • Pycytominer pipeline
  • Evaluation metrics

LINCS dataset

Details on the LINCS dataset can be found in the repository but the most important details are:

  • A549 cell line
  • 1,571 compounds/perturbations
  • 6 doses per compound
  • 5 replicas of each dose and compound

Size:

  • 5 batches
  • 136 plates
  • 386 wells per plate
  • 9 sites per well

With each well containing ~2,000 cells and with ~52,000 wells in total, the LINCS database holds around 10 million cells.

The different stages of processing, from images of each site to the aggregated profile (feature space) of a compound, are described in the profiles section of the LINCS repository. While the input data (images and location files) are described in the following chapter, the Pycytominer chapter will go into more detail on the pipeline and decisions made on the post-processing of the pipeline.

Clone this wiki locally