Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

COMBINE_SCOREFILE erroring out on custom scorefiles that do not provide other_allele #337

Closed
ashenfernando1 opened this issue Jul 11, 2024 · 4 comments · Fixed by PGScatalog/pygscatalog#30
Labels
bug Something isn't working

Comments

@ashenfernando1
Copy link

Description of the bug

I generated multiple custom scorefiles, without specifying the other_allele column. This produced the following error:

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.core.cli.combine_cli: 2024-07-11 16:14:15 DEBUG    Verbose logging enabled
  pgscatalog.core.cli.combine_cli: 2024-07-11 16:14:15 DEBUG    Compressing output with gzip
  
    0%|          | 0/54 [00:00<?, ?it/s]pgscatalog.core.cli.combine_cli: 2024-07-11 16:14:15 INFO     Processing custom_GCST001790
  
    0%|          | 0/54 [00:00<?, ?it/s]
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-combine", line 8, in <module>
      sys.exit(run())
               ^^^^^
    File "/app/pgscatalog.core/src/pgscatalog/core/cli/combine_cli.py", line 65, in run
      normalised_score = list(
                         ^^^^^
    File "/app/pgscatalog.core/src/pgscatalog/core/lib/scorefiles.py", line 485, in normalise
      yield from normalise(
    File "/app/pgscatalog.core/src/pgscatalog/core/lib/_normalise.py", line 71, in check_duplicates
      for variant in variants:
    File "/app/pgscatalog.core/src/pgscatalog/core/lib/_normalise.py", line 299, in detect_complex
      for variant in variants:
    File "/app/pgscatalog.core/src/pgscatalog/core/lib/_normalise.py", line 280, in check_effect_allele
      for variant in variants:
    File "/app/pgscatalog.core/src/pgscatalog/core/lib/_normalise.py", line 160, in assign_other_allele
      if "/" in variant.other_allele:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  TypeError: argument of type 'NoneType' is not iterable

Line 160 in pgscatalog/core/lib/_normalise.py belong to the assign_other_allele function and refer to:

    n_dropped = 0
    for variant in variants:
        if "/" in variant.other_allele:
            n_dropped += 1
            variant.other_allele = None

After erroring out, I changed the _normalise.py Line 160 to:

    for variant in variants:
        if variant.other_allele is None:
            n_dropped += 1
       # if "/" in variant.other_allele:
       #     n_dropped += 1
       #    variant.other_allele = None

and the work directory completed successfully and a scorefiles.txt.gz was successfully created.

I wanted to check that this error is legitimate and the fix is valid. (I was unable to figure out how to access the /app/pgscatalog.core/.../_normalise.py to see if the pipeline proceeds to completion. If you could let me know how to manipulate the Dockerfile (presumably?), I can try to implement changes and report back)

Command used and terminal output

$ nextflow run pgscatalog/pgsc_calc -profile docker -r v2.0.0-beta --input ... --target_build GRCh38 --scorefile ".../*.txt"

Relevant files

No response

System information

No response

@nebfield
Copy link
Member

nebfield commented Jul 12, 2024

Thanks for the bug report and your investigation 😄

I've prepared a patch to fix the problem.

Modifying python code in a docker container is a little tricky. If you want to edit the python code you could try using the conda profile and changing the local conda environments. The dockerfiles are in the linked repository if you're happy to try building the image. Otherwise once the PR is merged I'll integrate the fix into the dev branch of pgsc_calc (sometime next week probably, people are on holidays 🌴 )

@ashenfernando1
Copy link
Author

Brilliant, thanks!

I ended up adding the second line in

ext.docker = 'ghcr.io/pgscatalog/pygscatalog'
containerOptions = '-v /usr/local/lib/python3.10/dist-packages/pgscatalog/:/app/pgscatalog.core/src/pgscatalog/' 

to $HOME/.nextflow/assets/pgscatalog/pgsc_calc/conf/modules.config, and was able to load in the local pgscatalog repo to the docker app.

I copied the code in the patch (only lines in _normalise.py, others looked relevant for testing?), and the COMBINE_SCOREFILES process completes, but now fails at MATCH_VARIANTS step.

Caused by:
  Process `PGSCATALOG_PGSCCALC:PGSCCALC:MATCH:MATCH_VARIANTS (gsavplink chromosome ALL)` terminated with an error exit status (1)

Command executed:

  export POLARS_MAX_THREADS=2
  
  pgscatalog-match                  --dataset gsavplink         --scorefile scorefiles.txt.gz         --target GRCh38_gsavplink_ALL.pvar.zst         --only_match                                    --outdir $PWD         -v
  
  cat <<-END_VERSIONS > versions.yml
  MATCH_VARIANTS:
      pgscatalog.match: $(echo $(python -c 'import pgscatalog.match; print(pgscatalog.match.__version__)'))
  END_VERSIONS

Command exit status:
  1

Command output:
  (empty)

Command error:
  pgscatalog.match.cli.match_cli: 2024-07-12 14:50:42 WARNING  No output format specified, writing to combined scoring file
  pgscatalog.match.cli.match_cli: 2024-07-12 14:50:42 DEBUG    Verbose logging enabled
  pgscatalog.match.cli.match_cli: 2024-07-12 14:50:42 INFO     --cleanup set (default), temporary files will be deleted
  pgscatalog.match.lib.scoringfileframe: 2024-07-12 14:50:42 DEBUG    Converting ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) to feather format
  pgscatalog.match.lib.scoringfileframe: 2024-07-12 14:50:42 DEBUG    ScoringFileFrame(NormalisedScoringFile('scorefiles.txt.gz')) feather conversion complete
  pgscatalog.match.lib._match.preprocess: 2024-07-12 14:50:42 DEBUG    Complementing column effect_allele
  pgscatalog.match.lib._match.preprocess: 2024-07-12 14:50:42 DEBUG    Complementing column other_allele
  pgscatalog.match.lib.variantframe: 2024-07-12 14:50:42 DEBUG    Converting VariantFrame(path='GRCh38_gsavplink_ALL.pvar.zst', dataset='gsavplink', chrom=None, cleanup=True, tmpdir=PosixPath('tmp')) to feather format
  Traceback (most recent call last):
    File "/app/pgscatalog.utils/.venv/bin/pgscatalog-match", line 8, in <module>
      sys.exit(run_match())
               ^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 87, in run_match
      ipc_path = get_match_candidates(
                 ^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/cli/match_cli.py", line 124, in get_match_candidates
      with variants as target_df:
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/variantframe.py", line 54, in __enter__
      self.arrowpaths = loose(self.variants, tmpdir=self._tmpdir)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/usr/local/lib/python3.11/functools.py", line 909, in wrapper
      return dispatch(args[0].__class__)(*args, **kw)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 94, in _
      return batch_read(reader, tmpdir=tmpdir, cols_keep=cols_keep)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/pgscatalog/match/lib/_arrow.py", line 102, in batch_read
      batches = reader.next_batches(batch_size)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    File "/app/pgscatalog.utils/.venv/lib/python3.11/site-packages/polars/io/csv/batched_reader.py", line 134, in next_batches
      batches = self._reader.next_batches(n)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  polars.exceptions.ComputeError: found more fields than defined in 'Schema'
  
  Consider setting 'truncate_ragged_lines=True'.

Digging into this more and will report back with results.

Happy holidays! 🌴

@nebfield
Copy link
Member

nebfield commented Jul 12, 2024

Great 🥳 The polars problem should be fixed in the latest release, so if you update, clear the cache (rm -r work), and reapply your patch it should be OK.

@ashenfernando1
Copy link
Author

Pipeline completed on v2.0.0-beta.1! Thanks for your help on this. When the patch gets approved, presumably there would be another release? Feel free to close the issue whenever.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants