Skip to content

Conversation

sierra-moxon
Copy link
Collaborator

@sierra-moxon sierra-moxon commented Sep 17, 2025

@lmlui
@simroux - have a look at some of this test data we assembled with the help of @lmlui. We're wondering how to map plasmids and prophage/provirus geNomad output to the Feature class in the CDM for KBase. This was our first attempt. We also added sample data with questions in the tests/data dir to inform how we did this.

@simroux
Copy link

simroux commented Sep 18, 2025

Hi @sierra-moxon ,

I'm not familiar with the KBase CDM, so I need to brush up :-) But looking at these files here are a few things to keep in mind:

  • Plasmids will look easier to deal with initially, because plasmid predictions should always be individual contigs, i.e. geNomad output should have a list of ids that may seem logical to import as a "contigcollection".
  • Virus predictions are more complex, because they will be a mix of full contigs and "proviruses" (or "prophages"), i.e. virus regions identified on a host contig. You will notice them as "Provirus" in the topology column, and they will have coordinates in the "coordinates" column (instead of NA)

Taking a quick look at https://kbase.github.io/cdm-schema , I think prophage naturally fit as features, i.e. "A feature localized to an interval along a contig.". To keep things consistent, I wonder if all viruses and plasmids should be defined as features, with plasmids and non-provirus essentially being a feature over the whole contig ? That would enable these predictions to be flexible in the feature, e.g. if geNomad (or another tool) is used that can predict integrated plasmids, or even if a given virus gets its prediction refined from full contig to a provirus. It looks like the terms "SO_0001041" and "SO_0000155" would work ?

Let me know if that makes sense, I can also provide geNomad output examples that have provirus predictions (looks like there were none in Lauren's data ?)

@ialarmedalien
Copy link
Collaborator

I added some constraints around CDM identifiers in the last schema update, so I will update your data accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants