Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prepare data for validation #100

Open
andreasbueckle opened this issue Aug 14, 2024 · 11 comments
Open

Prepare data for validation #100

andreasbueckle opened this issue Aug 14, 2024 · 11 comments

Comments

@andreasbueckle
Copy link
Collaborator

andreasbueckle commented Aug 14, 2024

Goal

Create scatter graph where each dot is a dataset out of 553:
image

Plot as 2D graph with RUI correctness on y-axis and CTann correctness on x-axis.
Each dot is one of the 553 Atlas datasets for which we do have “gold standard” data but we pretend not to have it.

RUI-based spatial correctness on y-axis is computed via %containment and then weighted cosine between original %AS to predicted %AS and nowhereland (empty space in registration, sticking out). If predicted = origins then the result is 1.

CTann correctness on x-axis is computed via weighted cosine between original CTann to predicted CTann. If predicted = origins then the result is 1.

For each of the 553, we we get both values (each between 0 and 1).

Legend:

  • Color 553 dots by organ.
  • Size code dots by #Datasets used to make the CTann prediction (differs by AS)
  • Add labels to the best RUI and CTann prediction dots, worst CTann prediction dot, and worst RUI prediction dot.

Data products needed

Need a long dataset with columns:

dataset ID CTann sim RUI sim organ tool sex
https://entity.api.hubmapconsortium.org/ancestors/d6e6c8e452ed628425e9e928306a6db0 0.78 0.98 heart azimuth male

Steps

To compute CTann sim:

To compute RUI sim:

Get tool, organ, sex from https://lod.humanatlas.io/graph/hra-pop/v0.10.2/assets/atlas-enriched-dataset-graph.jsonld.

@andreasbueckle
Copy link
Collaborator Author

Use https://github.com/hubmapconsortium/hra-glb-mesh-collisions to get distances between corridors?

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 14, 2024

With @bherr2, let's write a query to get:

  • dataset_id
  • organ_id
  • organ_label
  • sex
  • tool (could also be sc-proteomics)
  • CTann sim (datasetVsRuiSim)
  • RUI sim (%AS tags in true RUI vs %AS tags in predicted RUI [take cell summary of the atlas dataset and return a list of RUI locations (use highest cosine sim based on CTann]) (ruiVsTopPredictedDatasetSim)
  • RUI sim (CTann of RUI location of input dataset vs predicted RUI location) (ruiVsTopPredictedRuiSim)
  • RUI sim (CTann of input dataset itself vs predicted RUI location) (datasetVsTopPredictedRuiSim)

Later:

  • RUI sim Euclidean distance between true RUI location of the input dataset and the predicted RUI location, but need to check for containment --NEEDS MORE SPECIFICATION

How to define most similar RUI location:
Given the CTann of the input dataset, which of the 282 atlas RUI locations has the highest cosine sim when comparing its CTann to the one from the input dataset?

How to define most similar dataset:
Given the CTann of the input dataset, which of the 553 atlas dataset (if sex, organ, tool are the same) has the highest cosine sim when comparing its CTann to the one from the input dataset?

@andreasbueckle
Copy link
Collaborator Author

@bherr2
Let's place this into reports/atlas/validation-v7-ctann-rui

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 14, 2024

@andreasbueckle
Move all corridor GLBs into 1 scene, then export as 1 GLB
Then associate sceneNodes with rui location id (make a look-up?)

To keep name:
load with Blender API, then rename sceneNode with filename (=rui location id?). May need prefix so sceneNode does not start with number

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 14, 2024

@bherr2

image
M and F are right on top of each other which will mess with the collisions
Although we can probably filter that out in post (ie once we know if the rui is m/f, filter out irrelevant collisions)

I'm gonna load into blender and save as GLTF (json) to see how the scene is composed. May be a problem there.
Just the JSON part

{
    "asset":{
        "generator":"Khronos glTF Blender I/O v3.6.27",
        "version":"2.0"
    },
    "scene":0,
    "scenes":[
        {
            "name":"Scene",
            "nodes":[
                491
            ]
        }
    ],
    "nodes":[
        {
            "mesh":0,
            "name":"00087766-0287-467c-9060-b52773db3dce.glb"
        },
        {
            "mesh":1,
            "name":"0016badc-9917-402c-b950-257d77c50b3d.glb"
        },
        {
            "mesh":2,
            "name":"007eb4d9-1694-4380-99e1-4aba832d9227.glb"
        },
        {
            "mesh":3,
            "name":"00f945be-8604-4382-834d-707a37498a9a.glb"
        },
        {
            "mesh":4,
            "name":"016e1d91-9c07-46b7-8441-2975df328fb3.glb"
        },
        {
            "mesh":5,
            "name":"026751c5-ef86-4f35-a810-5f8adc2887a5.glb"
        },
        {

Looks like you need to change the mesh name, not just the scene node name
image

Also you should probably strip off the .glb in the name
image

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 15, 2024

@bherr2 shares new report with:

  • rui_location (input dataset) -> gets omitted when returning (PURLs)
  • dataset (input)
  • similar_rui_location (output with highest cosine sim) (PURLs)
  • similarity
  • tool
  • sex
    Then Andi computes %AS tag similarity between input and predicted

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 19, 2024

🚧 @andreasbueckle uses https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-x-axis.csv for x-axis
🚧 @andreasbueckle computes %AS tag similarity between rui_location and predicted_rui in https://github.com/x-atlas-consortia/hra-pop/blob/main/output-data/v0.10.3/reports/atlas/validation-v7-y-axis.csv
🚧 @andreasbueckle draws both as scatter graph, color by organ, facet by tool (and sex?)

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 20, 2024

Update 8/20/24:

  • It's OK if cosine sim is 1.0.
  • Add "no man's land" of each RUI location that is not within.
  • Handle cases where total(as-Intersection) > 1.0

For containment:
AGREED ON:
Step 1: compute % of orig loc TB per all AS and 1 nowhereland.
Step 2: compute % of predicted loc TB/corridor per all AS and 1 nowhereland.
Step 3: use % in weighted cosine for original vs. predicted vector

@andreasbueckle
Copy link
Collaborator Author

andreasbueckle commented Aug 22, 2024

Update 8/22/24:
Cannot use nowhereland, because it implies similarity between extraction sites where they might be none, e.g., rui 1 sticks out of kidney and rui 2 sticks out of heart

Instead, let's create a report to capture nowhereland: #105

@andreasbueckle
Copy link
Collaborator Author

I committed an updated notebook with 2 scattergraphs, one for each RUI sim measurement: https://github.com/cns-iu/hra-cell-type-populations-supporting-information/blob/main/validations/rui_ctann/rui_ctann_validation.ipynb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants