doc update

Ensembl · Oct 28, 2024 · 1db7b8d · 1db7b8d
1 parent 88a5a5c
commit 1db7b8d
Show file tree

Hide file tree

Showing 3 changed files with 17 additions and 5 deletions.
diff --git a/README.md b/README.md
@@ -14,7 +14,10 @@ This pipeline processes transcriptomic data for various taxon IDs, performing a
 
 4. **Run STAR Alignment**: Align the subsampled FASTQ files to the provided genome assembly using the STAR aligner, then store the results into the database.
 
+## Batching
 
+The batching option is available to process species with a huge amount of rnaseq data. The batches can be created via src/python/ensembl/genes/metadata/transcriptomic/check_for_transcriptomic_batch.py : give a taxon id and the batch size the script retrieves the list of run accession and split them in multiple txt files according to the batch size. 
+The pipeline considers the date of the last processed date as last cheked date for future updates.
 
 ### Mandatory arguments
 
@@ -25,6 +28,12 @@ The structure of the file can cahnge according to the running options
 | taxon_id,gca (header)   | 
 | <taxon_id>,<gca>        |
 
+In case of batching
+| csv file format |
+|-----------------|
+| taxon_id,gca,runs_file (header)           | 
+| <taxon_id>,<gca>,<path to the batch file> |
+
 
 #### `--outDir`
 Path to the directory where to store the results of the pipeline

diff --git a/pipelines/nextflow/workflows/short_read.nf b/pipelines/nextflow/workflows/short_read.nf
@@ -73,11 +73,14 @@ if (params.help) {
     log.info '  --transcriptomic_dbhost STR                   Db host server '
     log.info '  --transcriptomic_dbport INT                   Db port  '
     log.info '  --transcriptomic_dbuser STR                   Db user  '
-    log.info '  --transcriptomic_dbpassword STR                   Db password  '
-    log.info '  --user_r STR                 Db user read_only'
-    log.info '  --enscode STR                Enscode path '
-    log.info '  --outDir STR                 Output directory. Default is workDir'
-    log.info '  --csvFile STR                Path for the csv containing the db name' 
+    log.info '  --transcriptomic_dbpassword STR               Db password  '
+    log.info '  --enscode STR                                 Enscode path '
+    log.info '  --outDir STR                                  Output directory. Default is workDir'
+    log.info '  --csvFile STR                                 Path for the csv containing the db name' 
+    log.info '  --cacheDir                                    Path to the directory to use as cache for the intermediate files'
+    log.info '  --files_latency                               Sleep time (in seconds) after the genome and proteins have been fetched (default, 60 seconds)'
+    log.info '  --backupDB  bool                              Dump the db and save it in a zipped file '
+    log.info '  --cleanOutputDir  bool                        Remove all files present in theoutput directory except the db dump file'
     exit 1
 }
 

diff --git a/plot.jpeg b/plot.jpeg