Skip to content

Commit 6f46082

Browse files
vivbakdaniakijmarshall
authored
Custom cohorts in metamist (#615)
* In progress * In progress * Cohort GraphQL Skeleton -- functional now * This should go in another PR * ignore venv * Updated docker mariadb setup instructions * support for `name`, `derived_from` & other fields * Fix error if applying `ord` to int * Add all fields to model * Set empty dict to fix GraphQL non-null error * Fix error applying `ord` to int * Add `name` to `cohort` table * version bump * Formatting; cohort GQL schema * cohort layer update * cohort db table update * CohortBuilder route * Single SG ID form * SG ID from project(s) form * Cohort Builder page * Re-usable SG scrolling table * types definition * Add link to nav bar to cohort builder * Add substring search * Formatting; add assay filter on sample * Allow hashable assay filter when `meta` present * Type for project * Add GQL query to project form * Add GQL query to ID form * Handle form submission * Atomic transaction * Put query params in POST body * add APIError type * Add TODO note; add loader to search button * merge dupe sgs from search results * Use auto-gen API; better error handling * clean up imports * Make inputs reactive * Cohort detail view * import updates * add detail route * use with router hooks * exclude ids input * navigate to new detail vie * make project required * Add space between muck and msg * Get project info * Convert ids to int to fix not found error * cohort project resolver * Update table rendering logic * Update table rendering logic * Add search result info in warning box * Linter fixes * formatting * lint fix * unused import * Tmp fix for billing query inside NavBar * Custom cohorts SG query optimisation (#630) * add assay meta array to type * Fix rendering issues; add column customisation * add additional SG filters to GQL schema * Mod SG SQL query to filter on assay meta and sg timestamps * Add timestamp and assay field to SG model * Add assayMeta to GQL fetch * Use new optimised SG GQL query * Lint fix * Change dict value type to `Any` * Formatting * Only allow meta which is not `None` * White space trim * Add check that new sg is not archived * remove `assay_meta` in favour of nested assay object * Proceduraly add query joins as required * remove `assay_meta` in favour of nested assay object; add more filters * Options to convert/ignore specific fields in `to_sql` * Handle case when `archived` is an `int`; removed fields * Updated queries * mypy/pylint fixes * Add form inputs for remaining fields * npm audit * Remove log * Trim and remove empty values * Simplify cram/gvcf query * Remove merging - not required * Test addtional filters * test new sg filters * lint fix * Change to optional bool * mypy fix * change to `assertEqual` for better debugging * Fail fast if wrong type for `archived` * Add assay meta to detail view * update dbbase import * Update to_sql signature to match superclass. An attempt at pleasing the linter * Fix click overwriting param to None when not specified * Create endpoint to create cohort from criteria - Project/s * Fix query cohort, call connection directly * Add further criteria to create_cohort_from_criteria endpoint * Also deletes redundant create_cohort endpoint * Fix the linter, import order, spaces, etc * Remove AddFromIdListForm.tsx * Delete AddFromIDListForm * Delete custom cohorts UI elements * Remove link to Cohort Builder from NavBar * Remove link to custom cohorts page in routes * Revert package-lock to main * Handle SG ID exclusion in cohort creation * Add cohort_template table * Add create cohort template endpoint * Fix linting issues including; * Add basic support for cohort creation from template * Support query template in graphql, including adding project column * Handle cohort criteria and cohort template cases * Handle Templates in the layer and table * Add skeleton for create_custom_cohort script * Add dry-run, refactor script to reduce args, fix bug that doesnt allow template to be specified * Return rich ids in dry mode * Return rich ids when dry-run is false too * Add support for rich custom cohort IDs i.e. COHXXX * Add initial basic create_cohort_from_criteria() tests * Surely incorrect fix for `Field 'timestamp' doesn't have a default value` * Fix failing tests, move cohort creation to after dry run exit * Rename derived_from to template_id, in line with user feedback, to add clarity * Refactor generating sg_filter, to make room for supporting more inputs * Fix FK issue in project.xml, delete old, add new * Handle Sample Type as Cohort Criteria * D'oh, blackify those parentheses * Add test case exercising template_id foreign key * Remove debugging print * Add tranche of tests that need some sample data * Add cohort-related tables to SYSTEM VERSIONING and TABLES_ORDERED_BY_FK_DEPS This is required so that testbase.py can clear them out successfully. * Set audit_log_id when INSERTing into cohort tables Add audit_log_id field to cohort_template; the others already have it. * Create cohort rows with timestamp set to (localtime) now * Further tests for individual CohortCriteria fields * Validate projects on input for create_template * Remove what I assume is an artefact from a merge mistake * Add cohorts to analysis objects Co-authored-by: John Marshall <[email protected]> * Account for cohort_ids in test_get_analysis() test case * Verify that create_cohort_template now validates projects * Another test for an individual CohortCriteria field * Add test exercising re-evaluation of a cohort template * Add test using all CohortCriteria fields Note that these operate as "AND". I hoped to add another sample D/saliva/exome/long-read/ONT and use sg_ids_internal=[sgD], excluded_sgs_internal=[self.sgA] to get [B, D]. However what that actually selects is D-only AND short-read-only, hence matches nothing. * Update schema docs * Combine project.xml entries * Add human readable CTPL prefix to templates * Fix incorrect call of lunh_compute instead of lunh_is_valid * Combine John's project.xml entries into Vivian's * Add sample-type to cohort builder, fix typos * Return all the cohort details in builder * Add function to query analyses by cohort * Catch value error sooner when template ID is invalid * return None instead of error * Add tests exercising cohort queries * Add test_query_cohort * Cohort -> cohorts * Fix indent * Remove passing author to cohortlayer explicitly. * Add type hints * Rename clayer and cohortlayer to cohort_layer * fix lint, author removed. Add newline * Fix whitespace for linter * template id should not be nullable * Remove template_id, specify as strawberry field instead. Rename CohortTemplateModel to CohortTemplate. * Fix CohortTemplateInternal object is not subscriptable * Fix failing tests by modifying type hints, dict[str,str] -> CohortTemplateInternal * Add project ID checks * Raise ValueError instead of assertion, to ensure it is caught * Remove author being explicitly passed, use one from connection * Plural cohort, cohort_template fields * map to dict later, pass model * execute -> execute_many * Fix type of id on Cohort model * New model for creating cohorts, fix project missing bug, rename Cohort to CohortInternal, fix tests accordingly * Fix lint, although interesting that my linter didnt catch it * Fix type hint, cohort_ids should be int not str * Fix create_analysis, so it can handle no sgs as inputs * Create two where_strs, remove unused import * Remove sgs from cohortinternal model! * Add strict param to id transform function * redundant, no? remove a type check that already happened above * Add typing to function, raise value error if template nor projects provided * Fetch assay external ids too * Move escape_like_terms to utils, apply to contains filter * Implement Internal and External models for all objects. Move transform rich to raw, to be on the route. Handle raw ids only from layers onward. Update tests accordingly * Use UTC for test_query_with_creation_date comparisons The creation time of the record inserted by upsert_sample() will be reported in UTC, so we need to compare against today in UTC. Otherwise tests fail when run locally before lunchtimeish as it is still "yesterday" in UTC, so lt=today unexpectedly returns the just-created record. * Add basic tests for scripts/create_custom_cohort.py Fix the script's template_id type, reflecting CohortBody's corresponding member's change from str to int in d705285. * Add separate CohortCriteria/Template.to_internal() tests And in all the other tests, use CohortCriteriaInternal directly. * Switch get_project_write_connection to get_project_readonly_connection as noone will be able to use it at present * Return [] if no template meets criteria, switch return order of project and templates * Cohort ID should be None in dry-run mode * sample_types -> sample_type * Only run _query_cohort_ids query if :analysis_ids will be non-empty * Improve mocking in test_cohort_builder.py tests Actually call the underlying route so real data can be returned. In create_custom_cohort.py, add a return value for ease of testing and fix get_cohort_spec() type annotations. * Escape metacharacters in icontains query string (and add tests) * Make --dry-run an argumentless flag option * Bump version: 6.9.1 → 6.10.0 --------- Co-authored-by: Daniel Esposito <[email protected]> Co-authored-by: John Marshall <[email protected]> Co-authored-by: John Marshall <[email protected]>
1 parent d2604a8 commit 6f46082

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

43 files changed

+2478
-74
lines changed

.bumpversion.cfg

+1-1
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
[bumpversion]
2-
current_version = 6.9.1
2+
current_version = 6.10.0
33
commit = True
44
tag = False
55
parse = (?P<major>\d+)\.(?P<minor>\d+)\.(?P<patch>[A-z0-9-]+)

README.md

+2-2
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ It comprises three key components:
3636

3737
As of Jan 15, 2024 this schema should reflect the data structure on the tables:
3838

39-
![Database Structure](resources/2024-01-15_db-diagram.png)
39+
![Database Structure](resources/schemav7.7.png.png)
4040

41-
You can also find this at [DbDiagram](https://dbdiagram.io/d/Metamist-Schema-v6-6-2-65a48ac7ac844320aee60d16).
41+
You can also find this at [DbDiagram](https://dbdiagram.io/d/Metamist-Schema-v7-7-6600c875ae072629ced6a1fc).
4242

4343
The codebase contains the following modules worth noting:
4444

api/graphql/filters.py

+10
Original file line numberDiff line numberDiff line change
@@ -18,6 +18,8 @@ class GraphQLFilter(Generic[T]):
1818
gte: T | None = None
1919
lt: T | None = None
2020
lte: T | None = None
21+
contains: T | None = None
22+
icontains: T | None = None
2123

2224
def all_values(self):
2325
"""
@@ -38,6 +40,10 @@ def all_values(self):
3840
v.append(self.lt)
3941
if self.lte:
4042
v.append(self.lte)
43+
if self.contains:
44+
v.append(self.contains)
45+
if self.icontains:
46+
v.append(self.icontains)
4147

4248
return v
4349

@@ -53,6 +59,8 @@ def to_internal_filter(self, f: Callable[[T], Any] = None):
5359
gte=f(self.gte) if self.gte else None,
5460
lt=f(self.lt) if self.lt else None,
5561
lte=f(self.lte) if self.lte else None,
62+
contains=f(self.contains) if self.contains else None,
63+
icontains=f(self.icontains) if self.icontains else None,
5664
)
5765

5866
return GenericFilter(
@@ -63,6 +71,8 @@ def to_internal_filter(self, f: Callable[[T], Any] = None):
6371
gte=self.gte,
6472
lt=self.lt,
6573
lte=self.lte,
74+
contains=self.contains,
75+
icontains=self.icontains,
6676
)
6777

6878

api/graphql/loaders.py

+5-2
Original file line numberDiff line numberDiff line change
@@ -364,10 +364,13 @@ async def load_projects_for_ids(project_ids: list[int], connection) -> list[Proj
364364
"""
365365
pttable = ProjectPermissionsTable(connection)
366366
projects = await pttable.get_and_check_access_to_projects_for_ids(
367-
user=connection.user, project_ids=project_ids, readonly=True
367+
user=connection.author, project_ids=project_ids, readonly=True
368368
)
369+
369370
p_by_id = {p.id: p for p in projects}
370-
return [p_by_id.get(p) for p in project_ids]
371+
projects = [p_by_id.get(p) for p in project_ids]
372+
373+
return [p for p in projects if p is not None]
371374

372375

373376
@connected_data_loader(LoaderKeys.FAMILIES_FOR_PARTICIPANTS)

api/graphql/schema.py

+217-4
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
# type: ignore
22
# flake8: noqa
3-
# pylint: disable=no-value-for-parameter,redefined-builtin,missing-function-docstring,unused-argument
3+
# pylint: disable=no-value-for-parameter,redefined-builtin,missing-function-docstring,unused-argument,too-many-lines
44
"""
55
Schema for GraphQL.
66
@@ -22,13 +22,15 @@
2222
AnalysisLayer,
2323
AnalysisRunnerLayer,
2424
AssayLayer,
25+
CohortLayer,
2526
FamilyLayer,
2627
SampleLayer,
2728
SequencingGroupLayer,
2829
)
2930
from db.python.tables.analysis import AnalysisFilter
3031
from db.python.tables.analysis_runner import AnalysisRunnerFilter
3132
from db.python.tables.assay import AssayFilter
33+
from db.python.tables.cohort import CohortFilter, CohortTemplateFilter
3234
from db.python.tables.project import ProjectPermissionsTable
3335
from db.python.tables.sample import SampleFilter
3436
from db.python.tables.sequencing_group import SequencingGroupFilter
@@ -38,6 +40,8 @@
3840
AnalysisInternal,
3941
AssayInternal,
4042
AuditLogInternal,
43+
CohortInternal,
44+
CohortTemplateInternal,
4145
FamilyInternal,
4246
ParticipantInternal,
4347
Project,
@@ -46,6 +50,11 @@
4650
)
4751
from models.models.analysis_runner import AnalysisRunnerInternal
4852
from models.models.sample import sample_id_transform_to_raw
53+
from models.utils.cohort_id_format import cohort_id_format, cohort_id_transform_to_raw
54+
from models.utils.cohort_template_id_format import (
55+
cohort_template_id_format,
56+
cohort_template_id_transform_to_raw,
57+
)
4958
from models.utils.sample_id_format import sample_id_format
5059
from models.utils.sequencing_group_id_format import (
5160
sequencing_group_id_format,
@@ -73,6 +82,87 @@ async def m(info: Info) -> list[str]:
7382
GraphQLEnum = strawberry.type(type('GraphQLEnum', (object,), enum_methods))
7483

7584

85+
# Create cohort GraphQL model
86+
@strawberry.type
87+
class GraphQLCohort:
88+
"""Cohort GraphQL model"""
89+
90+
id: str
91+
name: str
92+
description: str
93+
author: str
94+
95+
@staticmethod
96+
def from_internal(internal: CohortInternal) -> 'GraphQLCohort':
97+
return GraphQLCohort(
98+
id=cohort_id_format(internal.id),
99+
name=internal.name,
100+
description=internal.description,
101+
author=internal.author,
102+
)
103+
104+
@strawberry.field()
105+
async def template(self, info: Info, root: 'Cohort') -> 'GraphQLCohortTemplate':
106+
connection = info.context['connection']
107+
template = await CohortLayer(connection).get_template_by_cohort_id(
108+
cohort_id_transform_to_raw(root.id)
109+
)
110+
111+
return GraphQLCohortTemplate.from_internal(template)
112+
113+
@strawberry.field()
114+
async def sequencing_groups(
115+
self, info: Info, root: 'Cohort'
116+
) -> list['GraphQLSequencingGroup']:
117+
connection = info.context['connection']
118+
cohort_layer = CohortLayer(connection)
119+
sg_ids = await cohort_layer.get_cohort_sequencing_group_ids(
120+
cohort_id_transform_to_raw(root.id)
121+
)
122+
123+
sg_layer = SequencingGroupLayer(connection)
124+
sequencing_groups = await sg_layer.get_sequencing_groups_by_ids(sg_ids)
125+
return [GraphQLSequencingGroup.from_internal(sg) for sg in sequencing_groups]
126+
127+
@strawberry.field()
128+
async def analyses(self, info: Info, root: 'Cohort') -> list['GraphQLAnalysis']:
129+
connection = info.context['connection']
130+
connection.project = root.project
131+
internal_analysis = await AnalysisLayer(connection).query(
132+
AnalysisFilter(
133+
cohort_id=GenericFilter(in_=[cohort_id_transform_to_raw(root.id)]),
134+
)
135+
)
136+
return [GraphQLAnalysis.from_internal(a) for a in internal_analysis]
137+
138+
@strawberry.field()
139+
async def project(self, info: Info, root: 'Cohort') -> 'GraphQLProject':
140+
loader = info.context[LoaderKeys.PROJECTS_FOR_IDS]
141+
project = await loader.load(root.project)
142+
return GraphQLProject.from_internal(project)
143+
144+
145+
# Create cohort template GraphQL model
146+
@strawberry.type
147+
class GraphQLCohortTemplate:
148+
"""CohortTemplate GraphQL model"""
149+
150+
id: str
151+
name: str
152+
description: str
153+
criteria: strawberry.scalars.JSON
154+
155+
@staticmethod
156+
def from_internal(internal: CohortTemplateInternal) -> 'GraphQLCohortTemplate':
157+
# At this point, the object that comes in doesn't have an ID field.
158+
return GraphQLCohortTemplate(
159+
id=cohort_template_id_format(internal.id),
160+
name=internal.name,
161+
description=internal.description,
162+
criteria=internal.criteria,
163+
)
164+
165+
76166
@strawberry.type
77167
class GraphQLProject:
78168
"""Project GraphQL model"""
@@ -243,6 +333,35 @@ async def analyses(
243333
)
244334
return [GraphQLAnalysis.from_internal(a) for a in internal_analysis]
245335

336+
@strawberry.field()
337+
async def cohorts(
338+
self,
339+
info: Info,
340+
root: 'Project',
341+
id: GraphQLFilter[int] | None = None,
342+
name: GraphQLFilter[str] | None = None,
343+
author: GraphQLFilter[str] | None = None,
344+
template_id: GraphQLFilter[int] | None = None,
345+
timestamp: GraphQLFilter[datetime.datetime] | None = None,
346+
) -> list['GraphQLCohort']:
347+
connection = info.context['connection']
348+
connection.project = root.id
349+
350+
c_filter = CohortFilter(
351+
id=id.to_internal_filter(cohort_id_transform_to_raw) if id else None,
352+
name=name.to_internal_filter() if name else None,
353+
author=author.to_internal_filter() if author else None,
354+
template_id=(
355+
template_id.to_internal_filter(cohort_template_id_transform_to_raw)
356+
if template_id
357+
else None
358+
),
359+
timestamp=timestamp.to_internal_filter() if timestamp else None,
360+
)
361+
362+
cohorts = await CohortLayer(connection).query(c_filter)
363+
return [GraphQLCohort.from_internal(c) for c in cohorts]
364+
246365

247366
@strawberry.type
248367
class GraphQLAuditLog:
@@ -472,11 +591,16 @@ async def participant(
472591

473592
@strawberry.field
474593
async def assays(
475-
self, info: Info, root: 'GraphQLSample', type: GraphQLFilter[str] | None = None
594+
self,
595+
info: Info,
596+
root: 'GraphQLSample',
597+
type: GraphQLFilter[str] | None = None,
598+
meta: GraphQLMetaFilter | None = None,
476599
) -> list['GraphQLAssay']:
477600
loader_assays_for_sample_ids = info.context[LoaderKeys.ASSAYS_FOR_SAMPLES]
478601
filter_ = AssayFilter(
479602
type=type.to_internal_filter() if type else None,
603+
meta=meta,
480604
)
481605
assays = await loader_assays_for_sample_ids.load(
482606
{'id': root.internal_id, 'filter': filter_}
@@ -607,7 +731,8 @@ async def assays(
607731
self, info: Info, root: 'GraphQLSequencingGroup'
608732
) -> list['GraphQLAssay']:
609733
loader = info.context[LoaderKeys.ASSAYS_FOR_SEQUENCING_GROUPS]
610-
return await loader.load(root.internal_id)
734+
assays = await loader.load(root.internal_id)
735+
return [GraphQLAssay.from_internal(assay) for assay in assays]
611736

612737

613738
@strawberry.type
@@ -696,13 +821,92 @@ async def project(
696821

697822

698823
@strawberry.type
699-
class Query:
824+
class Query: # entry point to graphql.
700825
"""GraphQL Queries"""
701826

702827
@strawberry.field()
703828
def enum(self, info: Info) -> GraphQLEnum:
704829
return GraphQLEnum()
705830

831+
@strawberry.field()
832+
async def cohort_templates(
833+
self,
834+
info: Info,
835+
id: GraphQLFilter[str] | None = None,
836+
project: GraphQLFilter[str] | None = None,
837+
) -> list[GraphQLCohortTemplate]:
838+
connection = info.context['connection']
839+
cohort_layer = CohortLayer(connection)
840+
841+
ptable = ProjectPermissionsTable(connection)
842+
project_name_map: dict[str, int] = {}
843+
project_filter = None
844+
if project:
845+
project_names = project.all_values()
846+
projects = await ptable.get_and_check_access_to_projects_for_names(
847+
user=connection.author, project_names=project_names, readonly=True
848+
)
849+
project_name_map = {p.name: p.id for p in projects}
850+
project_filter = project.to_internal_filter(
851+
lambda pname: project_name_map[pname]
852+
)
853+
854+
filter_ = CohortTemplateFilter(
855+
id=(
856+
id.to_internal_filter(cohort_template_id_transform_to_raw)
857+
if id
858+
else None
859+
),
860+
project=project_filter,
861+
)
862+
863+
cohort_templates = await cohort_layer.query_cohort_templates(filter_)
864+
return [
865+
GraphQLCohortTemplate.from_internal(cohort_template)
866+
for cohort_template in cohort_templates
867+
]
868+
869+
@strawberry.field()
870+
async def cohorts(
871+
self,
872+
info: Info,
873+
id: GraphQLFilter[str] | None = None,
874+
project: GraphQLFilter[str] | None = None,
875+
name: GraphQLFilter[str] | None = None,
876+
author: GraphQLFilter[str] | None = None,
877+
template_id: GraphQLFilter[int] | None = None,
878+
) -> list[GraphQLCohort]:
879+
connection = info.context['connection']
880+
cohort_layer = CohortLayer(connection)
881+
882+
ptable = ProjectPermissionsTable(connection)
883+
project_name_map: dict[str, int] = {}
884+
project_filter = None
885+
if project:
886+
project_names = project.all_values()
887+
projects = await ptable.get_and_check_access_to_projects_for_names(
888+
user=connection.author, project_names=project_names, readonly=True
889+
)
890+
project_name_map = {p.name: p.id for p in projects}
891+
project_filter = project.to_internal_filter(
892+
lambda pname: project_name_map[pname]
893+
)
894+
895+
filter_ = CohortFilter(
896+
id=id.to_internal_filter(cohort_id_transform_to_raw) if id else None,
897+
name=name.to_internal_filter() if name else None,
898+
project=project_filter,
899+
author=author.to_internal_filter() if author else None,
900+
template_id=(
901+
template_id.to_internal_filter(cohort_template_id_transform_to_raw)
902+
if template_id
903+
else None
904+
),
905+
)
906+
907+
cohorts = await cohort_layer.query(filter_)
908+
return [GraphQLCohort.from_internal(cohort) for cohort in cohorts]
909+
706910
@strawberry.field()
707911
async def project(self, info: Info, name: str) -> GraphQLProject:
708912
connection = info.context['connection']
@@ -761,6 +965,7 @@ async def sample(
761965
samples = await slayer.query(filter_)
762966
return [GraphQLSample.from_internal(sample) for sample in samples]
763967

968+
# pylint: disable=too-many-arguments
764969
@strawberry.field
765970
async def sequencing_groups(
766971
self,
@@ -772,6 +977,10 @@ async def sequencing_groups(
772977
technology: GraphQLFilter[str] | None = None,
773978
platform: GraphQLFilter[str] | None = None,
774979
active_only: GraphQLFilter[bool] | None = None,
980+
created_on: GraphQLFilter[datetime.date] | None = None,
981+
assay_meta: GraphQLMetaFilter | None = None,
982+
has_cram: bool | None = None,
983+
has_gvcf: bool | None = None,
775984
) -> list[GraphQLSequencingGroup]:
776985
connection = info.context['connection']
777986
sglayer = SequencingGroupLayer(connection)
@@ -812,6 +1021,10 @@ async def sequencing_groups(
8121021
if active_only
8131022
else GenericFilter(eq=True)
8141023
),
1024+
created_on=created_on.to_internal_filter() if created_on else None,
1025+
assay_meta=assay_meta,
1026+
has_cram=has_cram,
1027+
has_gvcf=has_gvcf,
8151028
)
8161029
sgs = await sglayer.query(filter_)
8171030
return [GraphQLSequencingGroup.from_internal(sg) for sg in sgs]

0 commit comments

Comments
 (0)