-
Notifications
You must be signed in to change notification settings - Fork 4
PacBio #143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merged
Merged
PacBio #143
Changes from 42 commits
Commits
Show all changes
43 commits
Select commit
Hold shift + click to select a range
5d65b48
initial changes
antgonza 844e4bd
Merge branch 'main' of https://github.com/qiita-spots/qp-knight-lab-p…
antgonza 55ab210
init push
antgonza 4439472
fix InstrumentUtils
antgonza 6bf3586
rm test_required_file_checks
antgonza 411a119
m11111_20250101_111111_s4
antgonza e8a2d0b
add conda
antgonza 5e16ab1
pacbio_generate_bam2fastq_commands
antgonza 827a372
self.node_count -> self.nprocs
antgonza 551bfd1
generate_sequence_counts
antgonza 6311723
init changes to def _inject_data(self, wf):
antgonza 40afadf
read_length
antgonza 52f4c53
pmls_extra_parameters
antgonza e18cf5d
rm index in _inject_data
antgonza 0734a6e
dstats
antgonza 281bf75
nuqc_job_single.sh
antgonza 1aa102c
rm extra ,
antgonza 01c862d
print Profile selected
antgonza 05036b7
counts.txt in _inject_data
antgonza d5adee0
demux_just_fwd
antgonza 2fcfc28
demux_single -? demux_just_fwd
antgonza 27f49d1
add cli for demux_just_fwd
antgonza dea4fd9
fix demux_just_fwd params
antgonza e596ade
rm demux_just_fwd_processing splitter_binary
antgonza b27e7d6
self.files_regex = long_read
antgonza 1666516
sample_id_column
antgonza 0141440
mv self.read_length = read_length up
antgonza 2b5700f
_filter_empty_fastq_files
antgonza cc29d39
zip_longest
antgonza 3a94746
raw_reads_r1r2
antgonza 6e12dd5
fastq.gz -> trimmed.fastq.gz
antgonza b2a308e
barcode
antgonza 1da1abb
barcode -> barcode_id
antgonza e600bfb
S000
antgonza d73b6a7
del raw_reverse_seqs
antgonza c2b0a3e
test filenames
antgonza b2b4d60
S000_L001_R1_001.counts.txt
antgonza 0051f49
{rec}{sid}
antgonza 1418cbc
rm touch for gz files
antgonza 62e388e
add_default_workflow
antgonza 27efe2d
fixing counts
antgonza a323f46
fix TestPipeline.test_generate_sample_information_files
antgonza 48010d3
restart changes
antgonza File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,66 @@ | ||
| from .Protocol import PacBio | ||
| from sequence_processing_pipeline.Pipeline import Pipeline | ||
| from .Assays import Metagenomic | ||
| from .Assays import ASSAY_NAME_METAGENOMIC | ||
| from .FailedSamplesRecord import FailedSamplesRecord | ||
| from .Workflows import Workflow | ||
| import pandas as pd | ||
|
|
||
|
|
||
| class PacBioMetagenomicWorkflow(Workflow, Metagenomic, PacBio): | ||
| def __init__(self, **kwargs): | ||
| super().__init__(**kwargs) | ||
|
|
||
| self.mandatory_attributes = ['qclient', 'uif_path', | ||
| 'lane_number', 'config_fp', | ||
| 'run_identifier', 'output_dir', 'job_id', | ||
| 'is_restart'] | ||
|
|
||
| self.confirm_mandatory_attributes() | ||
|
|
||
| # second stage initializer that could conceivably be pushed down into | ||
| # specific children requiring specific parameters. | ||
| self.qclient = self.kwargs['qclient'] | ||
|
|
||
| self.overwrite_prep_with_original = False | ||
| if 'overwrite_prep_with_original' in self.kwargs: | ||
| self.overwrite_prep_with_original = \ | ||
| self.kwargs['overwrite_prep_with_original'] | ||
| self.pipeline = Pipeline(self.kwargs['config_fp'], | ||
| self.kwargs['run_identifier'], | ||
| self.kwargs['uif_path'], | ||
| self.kwargs['output_dir'], | ||
| self.kwargs['job_id'], | ||
| ASSAY_NAME_METAGENOMIC, | ||
| lane_number=self.kwargs['lane_number']) | ||
|
|
||
| self.fsr = FailedSamplesRecord(self.kwargs['output_dir'], | ||
| self.pipeline.sample_sheet.samples) | ||
|
|
||
| samples = [ | ||
| {'barcode': sample['barcode_id'], | ||
| 'sample_name': sample['Sample_ID'], | ||
| 'project_name': sample['Sample_Project'], | ||
| 'lane': sample['Lane']} | ||
| for sample in self.pipeline.sample_sheet.samples] | ||
| df = pd.DataFrame(samples) | ||
| sample_list_fp = f"{self.kwargs['output_dir']}/sample_list.tsv" | ||
| df.to_csv(sample_list_fp, sep='\t', index=False) | ||
|
|
||
| self.master_qiita_job_id = self.kwargs['job_id'] | ||
|
|
||
| self.lane_number = self.kwargs['lane_number'] | ||
| self.is_restart = bool(self.kwargs['is_restart']) | ||
|
|
||
| if self.is_restart is True: | ||
| self.determine_steps_to_skip() | ||
|
|
||
| # this is a convenience member to allow testing w/out updating Qiita. | ||
| self.update = True | ||
|
|
||
| if 'update_qiita' in kwargs: | ||
| if not isinstance(kwargs['update_qiita'], bool): | ||
| raise ValueError("value for 'update_qiita' must be of " | ||
| "type bool") | ||
|
|
||
| self.update = kwargs['update_qiita'] |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,58 @@ | ||
| #!/usr/bin/env python | ||
|
|
||
| # ----------------------------------------------------------------------------- | ||
| # Copyright (c) 2014--, The Qiita Development Team. | ||
| # | ||
| # Distributed under the terms of the BSD 3-clause License. | ||
| # | ||
| # The full license is in the file LICENSE, distributed with this software. | ||
| # ----------------------------------------------------------------------------- | ||
| import click | ||
| import pandas as pd | ||
| from glob import glob | ||
| from os import makedirs | ||
|
|
||
|
|
||
| @click.command() | ||
| @click.argument('sample_list', required=True) | ||
| @click.argument('run_folder', required=True) | ||
| @click.argument('outdir', required=True) | ||
| @click.argument('threads', required=True, default=1) | ||
| def generate_bam2fastq_commands(sample_list, run_folder, outdir, threads): | ||
| """Generates the bam2fastq commands""" | ||
| df = pd.read_csv(sample_list, sep='\t') | ||
|
|
||
| # pacbio raw files are in a hifi_reads folder, wihtin multiple folders | ||
| # (1_A01, 2_A02, ect), within the run-id folder; and are named | ||
| # m[run-id]XXX.hifi_reads.[barcode].bam; thus to find the [barcode] we | ||
| # can split on '.' and then the second to last element [-2]. | ||
| files = {f.split('.')[-2]: f | ||
| for f in glob(f'{run_folder}/*/hifi_reads/*.bam')} | ||
|
|
||
| makedirs(outdir, exist_ok=True) | ||
|
|
||
| commands, missing_files = [], [] | ||
| for _, row in df.iterrows(): | ||
| bc = row['barcode'] | ||
| sn = row['sample_name'] | ||
| pn = row['project_name'] | ||
| lane = row['lane'] | ||
| if bc not in files: | ||
| missing_files.append(bc) | ||
| continue | ||
| od = f'{outdir}/{pn}' | ||
|
|
||
| makedirs(od, exist_ok=True) | ||
| fn = f'{od}/{sn}_S000_L00{lane}_R1_001' | ||
| cmd = (f'bam2fastq -j {threads} -o {fn} -c 9 ' | ||
| f'{files[bc]}; ' | ||
| f'fqtools count {fn}.fastq.gz > ' | ||
| f'{fn}.counts.txt') | ||
| commands.append(cmd) | ||
|
|
||
| if missing_files: | ||
| raise ValueError( | ||
| f'{run_folder} is missing barcodes: {missing_files}') | ||
|
|
||
| for cmd in commands: | ||
| print(cmd) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is this split/slice grabbing from the file name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding some comments about this.