Skip to content

SCRUM-6110: Ltp expression dto classes and properties#325

Open
chris-grove wants to merge 4 commits into
mainfrom
LTP-Expression-DTO-classes-and-properties
Open

SCRUM-6110: Ltp expression dto classes and properties#325
chris-grove wants to merge 4 commits into
mainfrom
LTP-Expression-DTO-classes-and-properties

Conversation

@chris-grove

Copy link
Copy Markdown
Member

Summary

  • Add three new DTO classes to expression.yaml to support LTP gene-expression ingest:
    • GeneExpressionExperimentDTO — top-level submission class; inlines its child annotations via expression_annotation_dtos
      so submitters don't back-reference the experiment by id
    • GeneExpressionAnnotationDTO — inherits evidence_curie from AnnotationDTO, wraps a single ExpressionPatternDTO
    • ExpressionPatternDTO — pairs when_expressed_dto (TemporalContextDTO) with where_expressed_dto (AnatomicalSiteDTO)
  • Add supporting slots: expression_pattern_dto, when_expressed_dto, where_expressed_dto, detection_reagent_identifiers,
    specimen_genomic_model_identifier, specimen_allele_identifiers, related_figure_identifiers, expression_annotation_dtos
  • Add free-text fallbacks where_expressed_free_text (on AnatomicalSite/AnatomicalSiteDTO) and when_expressed_free_text
    (on TemporalContext/TemporalContextDTO) for cases where controlled terms are insufficient
  • Fill out AnatomicalSiteDTO to mirror the source AnatomicalSite slot set: cellular_component_ribbon_term_curies (new),
    plus cellular_component_other, anatomical_structure_uberon_term_other, anatomical_substructure_uberon_term_other
    (existing boolean slots, domain: removed so they're reusable by both source and DTO)
  • Fix: AnatomicalSiteDTO previously inherited from AuditedObject; corrected to AuditedObjectDTO
  • Wire gene_expression_experiment_ingest_set into the Ingest class and import expression in ingest.yaml
  • Add valid/invalid test fixtures and register them in the Makefile
  • Regenerate generated/jsonschema/allianceModel.schema.json

Test plan

  • make test-jsonschema — 34/34 valid fixtures pass (incl. new gene_expression_test.json)
  • make test-jsonschema_invalid — 5/5 invalid fixtures correctly rejected (incl. new gene_expression_invalid.json, which
    fails on where_expressed_statement, gene_identifier, expression_assay_curie)
  • Confirm downstream consumers (curation-api) can deserialize the new DTOs

chris-grove and others added 2 commits May 26, 2026 15:55
Add DTO classes for GeneExpressionExperiment and GeneExpressionAnnotation,
plus free-text fields on AnatomicalSite/TemporalContext (and their DTOs),
and valid/invalid expression test files.

- Add where_expressed_free_text to AnatomicalSite + AnatomicalSiteDTO.
- Add when_expressed_free_text to TemporalContext + TemporalContextDTO.
- Add ExpressionPatternDTO, GeneExpressionExperimentDTO, and
  GeneExpressionAnnotationDTO. Annotations are inlined under their
  parent experiment via expression_annotation_dtos (inlined_as_list: true)
  so submitters do not need to back-reference experiments by identifier.
- Fix AnatomicalSiteDTO inheritance to AuditedObjectDTO (was AuditedObject).
- Wire gene_expression_experiment_ingest_set into the Ingest class.
- Add gene_expression_test and gene_expression_invalid to the Makefile
  test targets and ship corresponding JSON fixtures.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@claude

claude Bot commented Jun 3, 2026

Copy link
Copy Markdown

Review — PR #325 (gene expression ingest DTOs)

Summary: Adds the ingest side of the gene-expression model — GeneExpressionExperimentDTO, GeneExpressionAnnotationDTO, and ExpressionPatternDTO, with annotations inlined under the experiment via expression_annotation_dtos. Also fills in missing AnatomicalSiteDTO/TemporalContextDTO slots, registers the new gene_expression_experiment_ingest_set, and adds valid/invalid test fixtures (both registered in the Makefile).

Assertions (mechanical) — all clean

Walked the entity↔DTO slot mapping and the suffix rules; nothing to flag:

  • DTO coverage of ExpressionExperiment/ExpressionAnnotation is complete and the suffixes are right — entity_assayedgene_identifier, expression_assay_usedexpression_assay_curie, single_referencereference_curie/evidence_curie, detection_reagentsdetection_reagent_identifiers, cross_referencescross_reference_dtos, cellular_component_ribbon_termscellular_component_ribbon_term_curies, etc.
  • No redundant re-declaration of inherited slots; data_provider_dto/note_dtos/primary_external_id/mod_internal_id come from SubmittedObjectDTO, evidence_curie from SingleReferenceAssociationDTO.
  • AnatomicalSiteDTO is_a: AuditedObjectis_a: AuditedObjectDTO (expression.yaml:315) is a correct fix — it was previously inheriting the persistence base, which is why created_by/updated_by now correctly become created_by_curie/updated_by_curie in the generated artifact.
  • Dropping domain: AnatomicalSite from the three boolean *_other slots (expression.yaml:629-645) is right now that AnatomicalSiteDTO reuses them under the same name (plain-boolean convention).
  • New classes/slots all have descriptions and explicit ranges; required-at-ingest fields (gene/assay/reference/stage-name/where-statement) line up with the FMS "REQUIRED" mapping documented in GeneExpressionAnnotation.notes, so DQMs can supply them.

Questions (judgment calls)

  1. Image associations dropped from the annotation DTO. The entity ExpressionAnnotation carries expression_annotation_image_associations (optional), but GeneExpressionAnnotationDTO has no counterpart. Intentional — i.e. image panes are loaded through a separate path and never via this ingest set? Worth confirming so submitters with figure-image links aren't silently dropping them.
  2. gene_identifier / reference repeated on parent and child. Since expression_annotation_dtos are inlined under the experiment (which already requires gene_identifier + reference_curie), requiring gene_identifier and evidence_curie again on every child annotation duplicates data the submitter must repeat per row. Is that the intended contract, or should the loader propagate the gene/reference down from the parent experiment? (The entity model requires them on the annotation, so this is a loader/ingest-ergonomics call for the curation team.)
  3. Reference slot naming. The experiment uses reference_curie while the annotation uses the inherited evidence_curie for what is the same single supporting Reference. This mirrors the existing AnnotationDTO convention, so probably fine — just flagging the asymmetry in case consumers expect one name.

Overall this looks solid and ready pending the curation-team answers above; the schema mechanics are correct.

@chris-grove chris-grove marked this pull request as draft June 3, 2026 13:07
@chris-grove chris-grove changed the title Ltp expression dto classes and properties SCRUM-6110: Ltp expression dto classes and properties Jun 3, 2026
... to remove unnecessary (and redundant) properties.
Also, cleaned up some notes and typos
@chris-grove

Copy link
Copy Markdown
Member Author

Responding to the questions from Claude Review:

  1. We will later establish a DTO for expression-annotation-image-associations once we have images to submit; not needed right now
  2. This has been fixed by refactoring the ExpressionAnnotation class to inherit directly from Association instead of Annotation and clearing out the unnecessary and redundant properties
  3. Also addressed with the refactor mentioned in number 2

@chris-grove chris-grove marked this pull request as ready for review June 4, 2026 13:04
@chris-grove chris-grove requested a review from cmpich June 4, 2026 14:24
@chris-grove

Copy link
Copy Markdown
Member Author

Just added one change to make single_reference of ExpressionExperiment required

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant