Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: No more features left. Check to make sure that the sample names between sample-metadata and table are consistent #34

Open
johannesbjork opened this issue Sep 1, 2020 · 1 comment

Comments

@johannesbjork
Copy link

johannesbjork commented Sep 1, 2020

Running the stand-alone version of gemelli on the example data used in the tutorial I get the error ValueError: No more features left. Check to make sure that the sample names between sample-metadataandtable are consistent

As I'm not a Python person, I filter the example data in R.

mdat <- read.table("IBD-2538/data/metadata.tsv", sep='\t', header=T) # nrow(mdat) 516
ftbl <- biomformat::read_biom("IBD-2538/data/table.biom")
ftbl <- as(biomformat::biom_data(ftbl), "matrix") # ncol(ftbl) 470

mdat <- mdat %>% filter(sample_name %in% colnames(ftbl))
rownames(mdat) <- mdat $sample_name

ps <- phyloseq(otu_table(ftbl, taxa_are_rows=T),
                   sample_data(mdat))
# here I skip adding the taxonomy

ps <- metagMisc::phyloseq_filter_prevalence(ps, prev.trh=0.2, abund.trh=10, abund.type="total", threshold_condition="AND")

> ps
phyloseq-class experiment-level object
otu_table()   OTU Table:         [ 236 taxa and 318 samples ]
sample_data() Sample Data:       [ 318 samples by 128 sample variables ]

# Do we need to filter to only keep subjects with >=t timepoints?

biomformat::write_biom(biomformat::make_biom(t(otu_table(ps))), "table_filt.biom")
write.table(sample_data(ps), "metadata_filt.txt", sep="\t", quote=F)

Having made sure that samples match between the feature table and the metadata (plus filtered the our rare stuff), I run gemelli and get the following error

gemelli \
--in-biom table_filt.biom \
--sample-metadata-file metadata_filt.txt \
--individual-id-column 'host_subject_id' \
--state-column-1 'timepoint' \
--output-dir results      

Traceback (most recent call last):
  File "/Users/johannesbjork/python/miniconda3/bin/gemelli", line 8, in <module>
    sys.exit(standalone_ctf())
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 829, in __call__
    return self.main(*args, **kwargs)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 782, in main
    rv = self.invoke(ctx)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 1066, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/click/core.py", line 610, in invoke
    return callback(*args, **kwargs)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/gemelli/scripts/_standalone_ctf.py", line 131, in standalone_ctf
    feature_metadata)
  File "/Users/johannesbjork/python/miniconda3/lib/python3.7/site-packages/gemelli/ctf.py", line 97, in ctf_helper
    raise ValueError(("No more features left.  Check to make sure that "
ValueError: No more features left.  Check to make sure that the sample names between `sample-metadata` and `table` are consistent
@cameronmartino
Copy link
Collaborator

Hi @johannesbjork,

Thank you for reporting this! The standalone CLI is the only tutorial I did not make and it seems that was an oversight on my part.

The error is occurring because the sample ids are labeled in the float format. So pandas are loading them as floats while biom is loading them as strings. This is causing the no sample ID matches between the table and metadata error seen above from gemelli.

I just fixed this in the tables here (fixed-IBD-example.zip) by adding a string ('s') to the sample names.

I will put in a PR for this fix and a standalone tutorial (issue #35).

The following command runs fine:

mkdir standalone-results
gemelli \
    --in-biom fixed-IBD-example/table.biom\
    --sample-metadata-file fixed-IBD-example/metadata.tsv \
    --individual-id-column 'host_subject_id' \
    --state-column-1 'timepoint' \
    --output-dir standalone-results

But to save runtime (since this is an example) you could also remove singletons with the --min-feature-count flag:

gemelli \
    --in-biom fixed-IBD-example/table.biom\
    --sample-metadata-file fixed-IBD-example/metadata.tsv \
    --individual-id-column 'host_subject_id' \
    --state-column-1 'timepoint' \
    --min-feature-count 1\
    --output-dir standalone-results

This also brings up a good point that a tutorial with R integration would be nice. I have added that to issue #35.

Thank you again for letting me know! and please let me know if this does not solve the problem for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants