-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open a GH issue for the Spark empty BAM case #232
Comments
Issue DraftTitleMarkDuplicatesSpark error when FASTQ headers contain another @ in string ContentHeaders with another Example header: |
@madisonjordan Looks good! I think it's worth mentioning that the example was found in one of the ICGC (International Cancer Genome Consortium) samples and there are many others in ICGC datasets. As we expect many more samples like this and want to avoid fixing the headers internally, we may want to consider adding |
Agreed, we do have the option to just not mark duplicates but it would be good to have a working option for these types of samples in the pipeline. |
Yup, my only concern is that some other GATK commands may have the same issue so we'll have to test it out first. |
I'll create the issue right now with the suggested change for where the data came from. |
Sounds good. Here is the test example Alfredo made that we can use for testing/development:
If any of you know of other example fastq files with @ in the header, or other problems with the header, please share. EDIT: Before adding |
Logs: Here's what we have now: Bug ReportAffected tool(s) or class(es)MarkDuplicatesSpark Affected version(s)
Description
Headers with another Example header: Steps to reproduceCommand:
Expected behaviorFinish MarkDuplicatesSpark successfully and output a valid bam file. Actual behaviorThe bam file is empty. |
@jarbet That's right but the issue of picard is single-threading and it's too slow even if it works. |
It looks like Alfredo provided enough info here #223
Can you check align-DNA 8.0.0? (Ignore 3.X -> IndelRealignment)
We should be able to get the actual command from the process logs without showing cluster paths if the output dir is still on the cluster. I believe same for the logs. |
Oh I see his comment of the version and the manual command he used that failed too. That should be fine. Thanks! @tyamaguchi-ucla is the info in this log okay? I had to cut out a lot of content because it was too long but I got the main parts.
|
Yup, looks good. It seems that no notification will be sent out (even tagging people) when the editing feature is used. |
Done. Putting the link to the GATK issue created in our issue 223. |
The text was updated successfully, but these errors were encountered: