Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ENH: Added action normalize that can normalize a FeatureTable[Frequency] by gene length, library size and composition. #44

Closed
wants to merge 27 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c22c0ba
added action for normalization
VinzentRisch Jan 16, 2024
00b45c8
merge with main
VinzentRisch Jan 22, 2024
926f876
added tpm normalize with cli integration
VinzentRisch Jan 25, 2024
e713e5e
adding tpm python implementation
VinzentRisch Jan 26, 2024
358ba73
added tpm python integration
VinzentRisch Feb 1, 2024
315e5a1
added GeneLength type
VinzentRisch Feb 1, 2024
8a6c46b
added transformer for GeneLength
VinzentRisch Feb 1, 2024
07fac99
Merge branch '42_Genelengts_type' into 23_norm_tpm
VinzentRisch Feb 1, 2024
942cd8b
changed input of normalize to genelength
VinzentRisch Feb 1, 2024
fe1d3a8
added transformer for GeneLength
VinzentRisch Feb 1, 2024
6dce0d8
added comment in test
VinzentRisch Feb 1, 2024
e2069ca
added comments
VinzentRisch Feb 2, 2024
ff80222
Merge branch '42_Genelengts_type' into 23_norm_tpm
VinzentRisch Feb 2, 2024
b6d2b94
added all normalization methods with tests except for CPM
VinzentRisch Feb 8, 2024
dd1f9ff
added cpm method
VinzentRisch Feb 8, 2024
3daa215
updated plugin setup
VinzentRisch Feb 9, 2024
011abf4
added rnanorm to meta.yaml
VinzentRisch Feb 9, 2024
907afa5
Revert "Merge branch '42_Genelengts_type' into 23_norm_tpm"
VinzentRisch May 29, 2024
6865eda
Revert "Merge branch '42_Genelengts_type' into 23_norm_tpm"
VinzentRisch May 29, 2024
9048180
merge main
VinzentRisch May 29, 2024
7b58de2
added transformer for allele annotation to sequencecharacteristics
VinzentRisch May 29, 2024
f4a20ed
changed normalize to work with characteristics
VinzentRisch May 29, 2024
6d416a3
changed tests
VinzentRisch May 29, 2024
085769b
changed parameter value error to value error
VinzentRisch Jun 3, 2024
7cf82d4
chnages after review
VinzentRisch Jun 5, 2024
f25449d
typo
VinzentRisch Jun 5, 2024
8bafcb2
typos
VinzentRisch Jun 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 11 additions & 9 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ To install _q2-amr_, follow the steps described below.
mamba create -yn q2-amr \
-c https://packages.qiime2.org/qiime2/2024.2/shotgun/released/ \
-c qiime2 -c conda-forge -c bioconda -c defaults \
qiime2 q2cli q2templates q2-types rgi
qiime2 q2cli q2templates q2-types rgi rnanorm

conda activate q2-amr

Expand All @@ -40,7 +40,7 @@ qiime info
CONDA_SUBDIR=osx-64 mamba create -yn q2-amr \
-c https://packages.qiime2.org/qiime2/2024.2/shotgun/released/ \
-c qiime2 -c conda-forge -c bioconda -c defaults \
qiime2 q2cli q2templates q2-types rgi
qiime2 q2cli q2templates q2-types rgi rnanorm

conda activate q2-amr
conda config --env --set subdir osx-64
Expand All @@ -62,15 +62,17 @@ qiime info
## Functionality
This QIIME 2 plugin contains actions used to annotate short single/paired-end
sequencing reads and MAGs with antimicrobial resistance genes. Currently, the [CARD](https://card.mcmaster.ca) database is supported (for details on
the implementation and usage, please refer to the [rgi](https://github.com/arpcard/rgi) documentation). Below you will
the implementation and usage, please refer to the [RGI](https://github.com/arpcard/rgi) documentation). Below you will
find an overview of actions available in the plugin.

| Action | Description | Underlying tool | Used function |
|----------------------------|--------------------------------------------------------------------------------------|---------------------------------------|--------------------------------------|
| fetch-card-db | Download and preprocess CARD and WildCARD data. | [rgi](https://github.com/arpcard/rgi) | card_annotation, wildcard_annotation |
| annotate-mags-card | Annotate MAGs with antimicrobial resistance gene information from CARD. | [rgi](https://github.com/arpcard/rgi) | main, load |
| annotate-reads-card | Annotate metagenomic reads with antimicrobial resistance gene information from CARD. | [rgi](https://github.com/arpcard/rgi) | bwt, load |
| heatmap | Create a heatmap from annotate-mags-card output files. | [rgi](https://github.com/arpcard/rgi) | heatmap |
| Action | Description | Underlying tool | Used function |
|---------------------|--------------------------------------------------------------------------------------|-------------------------------------------|--------------------------------------|
| fetch-card-db | Download and preprocess CARD and WildCARD data. | [RGI](https://github.com/arpcard/rgi) | card_annotation, wildcard_annotation |
| annotate-mags-card | Annotate MAGs with antimicrobial resistance gene information from CARD. | [RGI](https://github.com/arpcard/rgi) | main, load |
| annotate-reads-card | Annotate metagenomic reads with antimicrobial resistance gene information from CARD. | [RGI](https://github.com/arpcard/rgi) | bwt, load |
| heatmap | Create a heatmap from annotate-mags-card output files. | [RGI](https://github.com/arpcard/rgi) | heatmap |
| normalize | Normalize feature table with [FPKM](https://www.nature.com/articles/nmeth.1226), [TPM](https://link.springer.com/article/10.1007/s12064-012-0162-3), [TMM](https://genomebiology.biomedcentral.com/articles/10.1186/gb-2010-11-3-r25), [UQ](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/1471-2105-11-94), [CUF](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02568-9/), or [CTF](https://genomebiology.biomedcentral.com/articles/10.1186/s13059-021-02568-9/) method. | [rnaNORM](https://github.com/genialis/RNAnorm) | fpkm, tpm, tmm, uq, cuf, ctf |

VinzentRisch marked this conversation as resolved.
Show resolved Hide resolved

## Dev environment
This repository follows the _black_ code style. To make the development slightly easier
Expand Down
1 change: 1 addition & 0 deletions ci/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ requirements:
- q2templates {{ qiime2_epoch }}.*
- q2cli {{ qiime2_epoch }}.*
- rgi
- rnanorm
- tqdm

test:
Expand Down
8 changes: 1 addition & 7 deletions q2_amr/card/heatmap.py
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@
import pkg_resources
import q2templates

from q2_amr.card.utils import run_command
from q2_amr.card.utils import InvalidParameterCombinationError, run_command
from q2_amr.types import CARDAnnotationDirectoryFormat


Expand Down Expand Up @@ -50,12 +50,6 @@ def heatmap(
q2templates.render(templates, output_dir, context=context)


class InvalidParameterCombinationError(Exception):
def __init__(self, message="Invalid parameter combination"):
self.message = message
super().__init__(self.message)


def run_rgi_heatmap(tmp, json_files_dir, clus, cat, display, frequency):
cmd = [
"rgi",
Expand Down
90 changes: 90 additions & 0 deletions q2_amr/card/normalization.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,90 @@
import os

import pandas as pd
from q2_types.feature_data import SequenceCharacteristicsDirectoryFormat
from rnanorm import CPM, CTF, CUF, FPKM, TMM, TPM, UQ


def normalize(
table: pd.DataFrame,
method: str,
m_trim: float = None,
a_trim: float = None,
gene_length: SequenceCharacteristicsDirectoryFormat = None,
) -> pd.DataFrame:
# Validate parameter combinations and set trim parameters
m_trim, a_trim = _validate_parameters(method, m_trim, a_trim, gene_length)

# Process gene_lengths input and define methods that need gene_lengths input
if method in ["tpm", "fpkm"]:
lengths = _convert_lengths(table, gene_length)

methods = {
"tpm": TPM(gene_lengths=lengths),
"fpkm": FPKM(gene_lengths=lengths),
}

# Define remaining methods that don't need gene_lengths input
else:
methods = {
"tmm": TMM(m_trim=m_trim, a_trim=a_trim),
"ctf": CTF(m_trim=m_trim, a_trim=a_trim),
"uq": UQ(),
"cuf": CUF(),
"cpm": CPM(),
}

# Run normalization method on frequency table
normalized = methods[method].set_output(transform="pandas").fit_transform(table)
normalized.index.name = "sample_id"

return normalized


def _validate_parameters(method, m_trim, a_trim, gene_length):
# Raise Error if gene-length is missing when using methods TPM or FPKM
if method in ["tpm", "fpkm"] and not gene_length:
raise ValueError("gene-length input is missing.")

# Raise Error if gene-length is given when using methods TMM, UQ, CUF, CPM or CTF
if method in ["tmm", "uq", "cuf", "ctf", "cpm"] and gene_length:
raise ValueError(
"gene-length input can only be used with FPKM and TPM methods."
)

# Raise Error if m_trim or a_trim are given when not using methods TMM or CTF
if (method not in ["tmm", "ctf"]) and (m_trim is not None or a_trim is not None):
raise ValueError(
"Parameters m-trim and a-trim can only be used with methods TMM and CTF."
)

# Set m_trim and a_trim to their default values for methods TMM and CTF
if method in ["tmm", "ctf"]:
m_trim = 0.3 if m_trim is None else m_trim
a_trim = 0.05 if a_trim is None else a_trim

return m_trim, a_trim


def _convert_lengths(table, gene_length):
# Read in table from sequence_characteristics.tsv as a pd.Series
lengths = pd.read_csv(
os.path.join(gene_length.path, "sequence_characteristics.tsv"),
sep="\t",
header=None,
names=["index", "values"],
index_col="index",
squeeze=True,
skiprows=1,
)

# Check if all gene IDs that are present in the table are also present in
# the lengths
if not set(table.columns).issubset(set(lengths.index)):
only_in_counts = set(table.columns) - set(lengths.index)
raise ValueError(
f"There are genes present in the FeatureTable that are not present "
f"in the gene-length input. Missing lengths for genes: "
f"{only_in_counts}"
)
return lengths
3 changes: 3 additions & 0 deletions q2_amr/card/tests/data/feature-table.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
ID ARO:3000026|ID:377|Name:mepA|NCBI:AY661734.1 ARO:3000027|ID:1757|Name:emrA|NCBI:AP009048.1
sample1 2.0 0.0
sample2 2.0 0.0
Loading
Loading