Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difference in the data between the CellProfiler data and the Efficientnet data #9

Open
MattiasSehlstedt opened this issue Oct 21, 2021 · 6 comments

Comments

@MattiasSehlstedt
Copy link
Collaborator

What is the reason for there existing several replications across several different "Metadata_Plate" in the efficientnet data, while this doesn't exist within the CellProfiler data?

If one loads each of the two datasets, and runs the query
display(df[df.Metadata_broad_sample == 'BRD-K05804044-001-06-0']))
then the CellProfiler data will return 6 lines, where the difference between them is the dose concentration.
image

The Efficientnet data will return 5 lines, where the difference between the lines is their "Metadata_Plate" value and their "Metadata_Treatment_Replicate" value.

image

How come there seems to exist replicates within the Efficientnet data when the data is aggregated based on wells? And if the Efficientnet values are aggregations themselves, then how does these tie into the CellProfiler data and its lone row?

@michaelbornholdt
Copy link
Contributor

Yea this is confusing because those are two different stages of data. Level 5 above and level 3 below.

See:
https://github.com/broadinstitute/lincs-cell-painting/tree/master/profiles
https://github.com/broadinstitute/neural-profiling/wiki/01_Baseline

Top:
Technical replicas are already aggregated so the 6 different dosages are visible.

Bottom:
Technical replicas are visible here. The other doses have been deleted. See my subselection notebook

@michaelbornholdt
Copy link
Contributor

michaelbornholdt commented Oct 22, 2021

@MattiasSehlstedt Hope that makes it clear.

Also, if you use @ symbols, I will respond faster next time :)

@MattiasSehlstedt
Copy link
Collaborator Author

@michaelbornholdt So I guess that would mean that I would either have to modz your efficientnet data or work with https://github.com/broadinstitute/lincs-cell-painting/blob/master/profiles/2016_04_01_a549_48hr_batch1.dvc if I want a one-to-one row relation between some CellProfiler data and your Efficientnet data?

@michaelbornholdt
Copy link
Contributor

Yes correct!
it depends on what kind of analysis you wanna do.
If you are running somth like Enrichment which compare compounds (level 5) then just aggregate the Efficientnet profiles.

Also. You should be using the Spherized CP data instead of the non spherized.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants