-
Notifications
You must be signed in to change notification settings - Fork 196
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
vg call issues #4475
Comments
Your GBZ graphs do not have reference paths due to the way you built them. The The third command probably crashes, because the graph and the snarl file do not match. You built the snarls for the GBZ graph, which only contains the nodes and the edges used by the haplotype paths in the GBWT index. Your XG graph is based on |
I could not use vg autoindex, I got the error below. Note that I merged the VCFs of each chromosome and ran with vg autoindex --workflow giraffe --prefix /mnt/NEOGENE4/projects/pipeline_2024/autoindex --ref-fasta GCA_000001405.15_GRCh38_no_alt_plus_hs38d1_analysis_set.fna --vcf concatenated.vcf.gz last part of the error is below (full error log is attached: 2071261-gibbon.err.zip) :
|
My best guess is that vg tried to open too many files simultaneously and hit the limit. The first part of the error log is vg complaining about SVs that may be incorrectly represented or vg at least does not understand them. They should be irrelevant to the issue at hand. In the second part, vg complains about a large number of contigs with no variants. We also see that the input FASTA and VCF files have been chunked into hundreds of parts (one per contig). And then the actual error comes from You could try increasing the number of file descriptors with |
After increasing the ulimit, I now get the error below. I guess it's because there is not enough storage in "/tmp/xg-254s2K" right? Is there a way to decrease the amount of data that VG generates in that folder?
|
The temporary files are unavoidable. You can select another directory for them by using |
Thank you @jltsiren but our server does not allow the generation of such high memory temporary files. In your previous messages, you said that "Your GBZ graphs do not have reference paths due to the way you built them. The vg gbwt commands you used only include haplotype paths from the VCF. You would have to build separate GBWT indexes for the reference paths and merge them with the haplotype GBWTs. It is much easier to use vg autoindex, which does all this automatically. " Can you give a bit more detail on how I can achieve this without using vg autoindex? |
Let's go back to your original commands. First, you should use option Then you should create another GBWT with the reference paths. The following command should work:
The reference paths in Then you merge
The order of the arguments is important. If you switch the two GBWT files, the command will probably take several days to complete. Now you can create a GBZ graph from Because your VCF files contain thousands of haplotypes, Giraffe will probably be faster and more accurate with a downsampled GBZ. You can create it with:
You may probably want to use both GBZ graphs to see which of them works better. |
1. What were you trying to do?
Trying to call/genotype from the vg graph.
Commands:
1 - vg call all.gbz -r yamnaya.snarls -k yamnaya.pack -s Yamnaya -z > genotypes_yamnaya.vcf
2 - vg call all.gbz -r yamnaya.snarls -k yamnaya.pack -s Yamnaya -v concatenated.vcf.gz > genotypes_yamnaya.vcf
3 -vg call all.xg -k yamnaya.pack -r yamnaya.snarls > genotypes_yamnaya.vcf
Here's how I generated the graph:
And mappings:
2. What did you want to happen?
Generate small and large variants
3. What actually happened?
Commands (1) and (2) give "reference path not found" error and (3) gives the error in the attachment.
4. If you got a line like
Stack trace path: /somewhere/on/your/computer/stacktrace.txt
, please copy-paste the contents of that file here:vg_error.txt
5. What data and command can the vg dev team use to make the problem happen?
1 - vg call all.gbz -r yamnaya.snarls -k yamnaya.pack -s Yamnaya -z > genotypes_yamnaya.vcf
2 - vg call all.gbz -r yamnaya.snarls -k yamnaya.pack -s Yamnaya -v concatenated.vcf.gz > genotypes_yamnaya.vcf
3 -vg call all.xg -k yamnaya.pack -r yamnaya.snarls > genotypes_yamnaya.vcf
6. What does running
vg version
say?The text was updated successfully, but these errors were encountered: