Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GTDBTK_CLASSIFYWF process does not put the summary.tsv file in the output directory #637

Closed
jhayer opened this issue Jul 16, 2024 · 8 comments
Labels
bug Something isn't working

Comments

@jhayer
Copy link

jhayer commented Jul 16, 2024

Description of the bug

Hi,
We have been running nf-core/mag, 2 different versions (2.5.1 and 3.0.1) and we end up with missing files in the GTDB-Tk output directory. The summary.tsv files are in the work directories, but it seems that they are not moved to the main output dir.

The files that are present in the output directories for GTDB-Tk are the following:

gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.msa.fasta.gz
gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.user_msa.fasta.gz
gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.backbone.bac120.classify.tree.gz
gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.log
gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.warnings.log

In the corresponding work dir, I have those files:

 4,0K 13 juil. 12:33 bins/
 4,0K 13 juil. 12:33 database/
    0 13 juil. 12:33 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.warnings.log
    0 13 juil. 12:33 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.failed_genomes.tsv
  803 13 juil. 12:33 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.translation_table_summary.tsv
  29K 13 juil. 12:33 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.markers_summary.tsv
  14K 13 juil. 12:33 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.ar53.markers_summary.tsv
    0 13 juil. 12:35 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.filtered.tsv
 173M 13 juil. 12:36 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.msa.fasta.gz
  45K 13 juil. 12:36 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.user_msa.fasta.gz
 416K 13 juil. 13:29 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.5.tree
 500K 13 juil. 14:21 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.4.tree
 588K 13 juil. 16:46 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.3.tree
 569K 13 juil. 16:59 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.1.tree
 228K 13 juil. 17:03 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.8.tree
 4,0K 13 juil. 17:08 pplacer_tmp/
 466K 13 juil. 17:08 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.classify.tree.7.tree
 1,2K 13 juil. 17:08 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.tree.mapping.tsv
  33K 13 juil. 17:08 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.bac120.summary.tsv
 8,0K 13 juil. 17:08 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.log
 5,6K 13 juil. 17:08 gtdbtk.json
 4,0K 13 juil. 17:08 classify/
 4,0K 13 juil. 17:08 identify/
 4,0K 13 juil. 17:08 align/
  68K 13 juil. 17:08 gtdbtk.MEGAHIT-DASTool-unclassified-dastool_refined-BD2.backbone.bac120.classify.tree.gz
   61 13 juil. 17:08 versions.yml

This problem leads to another problem, being that then the main gtdbtk summary has no classification for none of the samples:

[hayer@node30 GTDB-Tk]$ head gtdbtk_summary.tsv 
user_genome	classification	fastani_reference	fastani_reference_radius	fastani_taxonomy	fastani_ani	fastani_af	closest_placement_reference	closest_placement_radius	closest_placement_taxonomy	closest_placement_ani	closest_placement_af	pplacer_taxonomy	classification_method	note	other_related_references(genome_id,species_name,radius,ANI,AF)	msa_percent	translation_table	red_value	warnings
MEGAHIT-MaxBin2-KT2.065.fa																	
MEGAHIT-MaxBin2-BD2.081.fa																	
MEGAHIT-MetaBAT2-CT2.33.fa																	
MEGAHIT-MaxBin2-BD2.070.fa																	
MEGAHIT-MaxBin2-KDT2.081.fa																	
MEGAHIT-MaxBin2-JT2.038.fa																	
MEGAHIT-MaxBin2-JT2.057.fa																	
MEGAHIT-MaxBin2-KDT2.091.fa																	
MEGAHIT-MetaBAT2-SA2.10.fa

Do you have an idea of what could go wrong here?
Thanks :-)

Command used and terminal output

nextflow run nf-core/mag -r 3.0.1 -profile singularity -resume -params-file nf-params.json -c local.config

Relevant files

file nf-params.json is:

{
    "input": "./khsamplesheet.csv",
    "outdir": "./out_khsample",
    "skip_adapter_trimming": true,
    "busco_db": "/projects/large/ARCIMED/DATABASE/busco_v5/busco_downloads",
    "busco_auto_lineage_prok": true,
    "cat_db": "/share/banks/CAT_db_2024-03-29/",
    "gtdb_db": "/projects/large/ARCIMED/DATABASE/gtdb_db/gtdbtk_r214_data.tar.gz",
    "genomad_db": "/projects/large/ARCIMED/DATABASE/genomad_db/genomad_db_v1.5/",
    "skip_spades": true,
    "skip_metaeuk": true,
    "skip_concoct": true,
    "run_virus_identification": true,
    "binning_map_mode": "own",
    "busco_clean": true,
    "refine_bins_dastool": true,
    "postbinning_input": "both",
    "run_gunc": true,
    "gunc_database_type": "gtdb",
    "gunc_save_db": true
}

file local.config

executor {
    name = 'slurm'
}

process {
	clusterOptions = '-p highmem --nodelist=node30'
	// You can also override existing process cpu or time settings here too

	withName: BUSCO {
        	errorStrategy = 'ignore'
    	}
}
```

[nextflow.log](https://github.com/user-attachments/files/16246970/nextflow.log)


### System information

I am using Nextflow v. 23.04.2 
nf-core/mag -r 3.0.1
Slurm
Singularity engine
@jhayer jhayer added the bug Something isn't working label Jul 16, 2024
@jfy133
Copy link
Member

jfy133 commented Jul 17, 2024

Hi @jhayer !

Thanks for the report.

The empty columns in the file can be valid behaviour sometimes... It looks like gtdbtk did not complete exactly, but if the pipeline didn't fail, possibly in a way that is valid to gtdbtk.

Could you please share the .command.log (hidden) file from the working directory,? And also the main .nextflow.log (hidden) file of the whole run?

@amizeranschi
Copy link
Contributor

I'm also having some issues with GTDBTK, which I've detailed in another issue: #641

In my case, it looks like the tool was only run on one sample (of several). I did get the gtdbtk_summary.tsv in the output directory (it was in <job_dir>/Taxonomy/GTDB-Tk), but it only contained proper results for bins from that sample, and empty lines for the other samples.

@jfy133
Copy link
Member

jfy133 commented Aug 21, 2024

@jhayer any chance you still have those log files? Otherwise it will be hard to investigate further.

@jhayer
Copy link
Author

jhayer commented Aug 27, 2024

Ok, I am sorry, the gtdbtk_summary.tsv is actually not empty for all bins, but only for the 840 first lines, the other half have info in all columns, so yes it might be the normal behaviour.

But I am still wondering why the *summary.tsv files are not present in the results directory of each sample (ex. in Taxonomy/GTDB-Tk/MEGAHIT/DASTool/BD2/). Is that wanted or a bug?

@jfy133
Copy link
Member

jfy133 commented Sep 11, 2024

I'm not 💯 sure still (sorry, I've been on parental leave then my son was sick)

In checking/testing another GTDB Tk pull request, I noticed that in some cases, if GTDB-Tk doesn't find any hits it just doesn't produce a output file that can be parsed.

For those samples that don't have a summary.tsv, can you check the corresponding .log file, and see if the end of the log reports something along those lines?

I've just had a thought how I can maybe demonstrate this, but please check in the meantime

@jfy133
Copy link
Member

jfy133 commented Sep 20, 2024

I've found the issue @jhayer !!!

This is apparently a three year old bug that no-one picked up on 😅

image

@d4straub
Copy link
Collaborator

Uff, sorry for that, my bad!

@jfy133
Copy link
Member

jfy133 commented Sep 20, 2024

Sorry this took such a long time for such a small fix @jhayer ! Thanks for reporting!

Fix in: https://github.com/nf-core/mag/pull/673/files

@jfy133 jfy133 closed this as completed Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants