-
Notifications
You must be signed in to change notification settings - Fork 9
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap]: Gaia Catalog #365
Comments
Is this entity resolution? Are these IDs in a controlled voc or ontollogy? If so we could
By "cross walking between catalogs" do you mean linking? If might that just be using PIDs and
More PIDs? Or is this trying to assign dereferencable IRIs for resources in the catalog?
Not sure what this is, need more.
Tooling UI stuff like this is hard. An easy entry point is trying to provide spread sheets with
Again, this is a big project. I'd recommend looking first to things like
Out of my wheelhouse ...
Shopping carts are hard too.. could you provide file serializations options as a parameter to the |
Made an image for this discussion at https://github.com/fils/OHDSI-GIS-Metadata-Mapping/tree/master/docs/workflow |
For the JSON-LD representation in Doug's workflow we agreed to follow the EPSIFed science-on-schema.org Dataset guide. It supports describing the variables with machine understandable vocabularies. Depending on the variable type, it supports other properties and their vocabularies like "unitcode". We could have settled on a different schema.org Dataset profile like the Croissant extension. There are repositories like HuggingFace and Kaggle that support discovery using Croissant metadata that has been married to a set of datasets in one of these repositories creating a data zone of so-called "Croissant datasets". However, when it comes to discovery with GaiaCatalog this would not have been fit to purpose. Recall that the GaiaCatalog doesn't contain any data, just metadata. Data and metadata are only married downstream in a staging database called the Gaia "backbone". This backbone does NOT serve as a repository in the Gaia workflow. Instead its purpose is to create a product -- records in a new OMOP CDM table called external_exposure. There is an ML framework in the OHDSI ecosystem but it runs on top of the OMOP CDM, not "Croissant datasets". ![]() |
@kzollove , @rtmill , @fils, @tibbben, @diatomsRcool, @AEW0330 would this be a more complete way to describe dataset access in a catalog entry? https://github.com/ESIPFed/science-on-schema.org/blob/main/guides/Dataset.md#distributions This is another feature of the science-on-schema.org profile of a dataset that we are recommending. When it comes to variables, following the ESIPFed profile, if the variable data type is numeric, we may want to include schema.measurementTechnique. In addition to text and a url, schema.measurementTechnique can take a StatisticalVariable. A StatisticalVariable can capture the nuances of an observation like the ones that Charles and Lara capture in Amadeus. @kzollove it may be something we can capture downstream in the OMOP external_exposure table in your exposure_relationship_concept_id which was intended to capture "the spatiotemporal join between exposure and location in a little more detail". |
@jaygee-on-github As of now, there are not really any OMOP tables that attempt to capture provenance of a measurement. Would you be thinking of this information being captured in an upstream GaiaCatalog and then "translated" to a vocabulary term downstream? There are examples from the measurement table and domain of preserving information about measurement techniques. They often rely on LOINC or CPT4 codes for standard concepts. You can see some examples in these search results. This is sort of how I imagined an exposure_relationship_concept working. I was thinking mostly about the spatiotemporal join information like you mentioned, but I could see it (or another concept) containing measurement technique information from the source data. |
@kzollove, @diatomsRcool it would be great if Charles and Lara already had a vocabulary for measurement techniques. It would be for aggregates what CDEs are for observations. I think Charles had an interest in a vocabulary of StatisticalVariables for this reason. In Amadeus I think Lara has such a field but it may not be structured. StatisticalVariable can provide structured metadata for a measurement technique |
@kzollove @tibbben @fils @AEW0330 @diatomsRcool: Following the science-on-schema.org Dataset profile, Doug created a catalog entry from a Tim GDSC catalog entry example. This example is a set of land parcels and their characteristics. The entry is a little sketchy. Also, at the variable level it includes two ways of describing a variable. In one way a variable description takes a "PropertyValue". In the other way the variable description takes a "StatisticalVariable". Each way provides a "definition of the calculated geospatial exposure metric". The "PropertyValue" path supports simple definitions. The "StatisticalVariable" path supports complex definitions. It is possible to mix and match. Doug calls this example the "0th order draft". Think of Doug's example as a template we can continue to work using Tim's example. Alternatively, we can use it as a guide which we can fill with other examples. Tim has provided a second example from the GDSC catalog. It is a PM2.5 example. We propose to have a couple of graduate students fill in the GaiaCatalog entry "template" Doug has proposed first with the PM2.5 example and then with one or more other examples including an SDoH one. Something that Tim covers in a GDSC catalog entry that is not supported in Doug's template at the moment is the provenance of the dataset that is the subject of the catalog entry. We would like to defer support for provenance in the template right now. Additionally, we are contemplating writing a paper about the GaiaCatalog as we use it more with geospatial datasets. |
@fils Doug, would you agree with the following: A variableMeasured in a science-on-schema.org Dataset can take a StatisticalVariable when it is an aggregate. Alternatively, an aggregate can take a PropertyValue just like an observation or measurement does. The Croissant Dataset profile is an alternative to the science-on-schema.org Dataset profile when the datapoints in a Dataset are not aggregates. Recall that in the Croissant profile a Dataset distribution takes a cr:FileObject or a cr:FileSet, and a cr:FileObject can be the source of cr:RecordSet which contains one or more cr:Field. Arguably a cr:Field in Croissant does not host aggregates like a variable in the science-on-schema.org Dataset profile can. |
Perhaps useful as examples: the GDSC catalog on Box (for this focus on the "attributes" and "table_name" columns): The SOLR endpoint (global pm 2.5 as a fairly complete example):
note: the gdsc_tablename can be edited with any table_name from the box file, for example the following is for table name "fl_2020_svi_county" (a less complete example):
NOTE currently the attribute metadata is serialized as (this is a draft): And you can explore the metadata and some of the geographies at: or simply: |
Using Tim's examples, this week we launched an activity staffed by a couple of APHRC graduate students that will use the template Doug developed. Recall it is based on the science-on-schema.org Dataset profile. However, the GDSC and other catalogs support provenance. With this in mind in a future we will be considering another Dataset profile Doug has contributed to: Croissant. We are also mindful of the RO-Crate work that Kyle is exploring. |
@tibbben, @fils, @p-talapova, @kzollove: Have we thought about how we might select concepts for the variables we include while fulfilling the variable part of our catalog entry template? I am thinking we can't use ATHENA because the new vocabularies and their concepts are in a delta potentially filled with so-called community vocabularies. @fils: What would be the level of effort to throw these vocabularies into a graph? And a different question: @fils: could we perform any validation on the structures of these vocabularies if we did graph them? Finally, @fils, tell me if these questions are too sketchy to answer? |
Metadata federation and catalog interoperability:
Intuitive interface to create catalog metadata entries:
Intuitive interface to explore the catalog and select data for use:
The text was updated successfully, but these errors were encountered: