Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ERROR: No enum auxiliary type exists. At src/slow5.c:1458 #9

Open
cabbagesofdoom opened this issue Dec 6, 2023 · 24 comments
Open

ERROR: No enum auxiliary type exists. At src/slow5.c:1458 #9

cabbagesofdoom opened this issue Dec 6, 2023 · 24 comments

Comments

@cabbagesofdoom
Copy link

Hi @Psy-Fer,

I am trying to convert some blow5 files to pod5 and get this error:

[slow5_get_aux_enum_labels::ERROR] No enum auxiliary type exists. At src/slow5.c:1458
06-Dec-23 14:45:55 - pyslow5 - [WARNING]: get_aux_enum_labels enum_labels is NULL
06-Dec-23 14:45:55 - pyslow5 - [WARNING]: get_header_value header value not found: ip_address - rg: 0
06-Dec-23 14:45:55 - pyslow5 - [WARNING]: get_header_value header value not found: local_bc_comp_model - rg: 0
06-Dec-23 14:45:55 - pyslow5 - [WARNING]: get_header_value header value not found: mac_address - rg: 0
Traceback (most recent call last):
  File "/srv/scratch/babsgenome/snakes/blow5/tiger/blue-crab-venv/bin/blue-crab", line 8, in <module>
    sys.exit(main())
  File "/srv/scratch/babsgenome/snakes/blow5/tiger/blue-crab-venv/lib/python3.10/site-packages/src/blue_crab.py", line 1529, in main
    slow52pod5(args)
  File "/srv/scratch/babsgenome/snakes/blow5/tiger/blue-crab-venv/lib/python3.10/site-packages/src/blue_crab.py", line 713, in slow52pod5
    s2s_s2p_worker(args, sfile, pod5_out)
  File "/srv/scratch/babsgenome/snakes/blow5/tiger/blue-crab-venv/lib/python3.10/site-packages/src/blue_crab.py", line 1161, in s2s_s2p_worker
    s5_end_reason = slow5_end_reason_labels[read.get("end_reason", 0)]
IndexError: list index out of range

Any ideas of what might cause this and how I might fix it?

Thanks!

Rich

@Psy-Fer
Copy link
Owner

Psy-Fer commented Dec 6, 2023

Oh yea that's my bad.

Let me fix that and get back to you.

@Psy-Fer
Copy link
Owner

Psy-Fer commented Dec 6, 2023

Hey,

Any chance you could show me the header columns of your data?

The first 2 lines above the actual reads (and below the header values)

It should be something like

#char*  uint32_t        double  double  double  double  uint64_t        int16_t*        enum{unknown,partial,mux_change,unblock_mux_change,data_service_unblock_mux_change,signal_positive,signal_negative}     char*   double  int32_t uint8_t uint64_t
#read_id        read_group      digitisation    offset  range   sampling_rate   len_raw_signal  raw_signal      end_reason      channel_number  median_before   read_number
     start_mux       start_time

What I'm looking for here is this part of it

enum{unknown,partial,mux_change,unblock_mux_change,data_service_unblock_mux_change,signal_positive,signal_negative}

This is the list that blue-crab tried to get from your slow5 file. If it's not present or fails, it tries to make it a list of just ["unknown"].

It looks like it's trying to use a value that is outside the length of the list. So having a look at the list is a good start to see if there is anything weird going on there.

An easy way to get that value from a blow5 file is to run this command

slow5tools view reads.blow5 | less

and just scroll down to that header line and copy paste it here.

Thanks
James

@Psy-Fer
Copy link
Owner

Psy-Fer commented Dec 6, 2023

I have also just pushed a change to the dev branch that has a check on this line of code that will spit out what slow5_end_reason_labels is set to if it fails as a quick way to troubleshoot.

So another way is to switch to the dev branch, run pip install . again then try running the same conversion again and wait for it to hit the same error.

Thanks
James

@killidude
Copy link

Hi @Psy-Fer,

I am also getting the same error:

blue-crab s2p minion_sim_1000_itrs.blow5 -o minion_test.pod5
05-Apr-24 19:03:38 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: minion_test.pod5
05-Apr-24 19:03:38 - blue-crab - [INFO]: Opening s/blow5 file: minion_sim_1000_itrs.blow5
[slow5_get_aux_enum_labels::ERROR] No enum auxiliary type exists. At src/slow5.c:1458
05-Apr-24 19:03:38 - pyslow5 - [WARNING]: get_aux_enum_labels enum_labels is NULL
Traceback (most recent call last):
File "/home/tomas/.local/bin/blue-crab", line 8, in
sys.exit(main())
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 1529, in main
slow52pod5(args)
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 713, in slow52pod5
s2s_s2p_worker(args, sfile, pod5_out)
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 1161, in s2s_s2p_worker
s5_end_reason = slow5_end_reason_labels[read.get("end_reason", 0)]
IndexError: list index out of range

The last two lines before the actual reads are:

#char* uint32_t double double double double uint64_t int16_tchar double int32_t uint8_t uint64_t
#read_id read_group digitisation offset range sampling_rate len_raw_signal raw_signal channel_number median_before read_number start_mux start_time

There is no enum{} in my files.

The slow5 files were generated (using the subprocess.run function of python) with the dna-r10-min model and full-contigs:

"squigulator " + fasta_derep + " -x dna-r10-min -o ./tmp/tmp_" + str(i) + ".slow5 --full-contig --seed " + str(random_numbers[i])

and then merged:

slow5tools merge tmp -o minion_sim_1000_itrs.slow5

The individual tmp files as well as the merged files have the same structure and no enum{} on line 9.

I tried the dna-r9-min model, and there also is no enum{} on line 9.

#char* uint32_t double double double double uint64_t int16_tchar double int32_t uint8_t uint64_t
#read_id read_group digitisation offset range sampling_rate len_raw_signal raw_signal channel_number median_before read_number start_mux start_time

Thanks,

Tomas

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 9, 2024

Ahh so these reads were built with squigulator?

I'll need to tell @hasindu2008 to put a dummy end_reason in the blow5 output.

In the meantime, I'll modify blue-crab to insert a dummy enum via an argument, making all reads end in the signal_positive state.

I'll get back to you in a sec

James

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 9, 2024

Hi Tomas,

Could you please try using the dev branch and showing me the error it gives you?

You can do this by activating your environment
if you installed with pip from pypi, please clone the blue-crab repo
git clone [email protected]:Psy-Fer/blue-crab.git
Then go to the blue-crab repo and run git pull
then git switch dev
You can check it worked by running git status and it should say something like

On branch dev
Your branch is up to date with 'origin/dev'.

Then re-install this dev version into your env

pip install .

Now re-run your bluecrab command.

something fishy is going on, but this should figure it out.

Cheers,
James

@killidude
Copy link

Hi James,

Thanks for looking into this.

Here is the output of the dev branch:

blue-crab s2p minion_sim_1000_itrs.blow5 -o minion_test.pod5
10-Apr-24 09:04:21 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: minion_test.pod5
10-Apr-24 09:04:21 - blue-crab - [INFO]: Opening s/blow5 file: minion_sim_1000_itrs.blow5
[slow5_get_aux_enum_labels::ERROR] No enum auxiliary type exists. At src/slow5.c:1458
10-Apr-24 09:04:21 - pyslow5 - [WARNING]: get_aux_enum_labels enum_labels is NULL
Traceback (most recent call last):
File "/home/tomas/.local/bin/blue-crab", line 8, in
sys.exit(main())
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 1558, in main
slow52pod5(args)
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 713, in slow52pod5
s2s_s2p_worker(args, sfile, pod5_out)
File "/home/tomas/.local/lib/python3.10/site-packages/src/blue_crab.py", line 1364, in s2s_s2p_worker
read_id=uuid.UUID(read["read_id"]),
File "/usr/lib/python3.10/uuid.py", line 177, in init
raise ValueError('badly formed hexadecimal UUID string')
ValueError: badly formed hexadecimal UUID string

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 10, 2024

Ahh progress!

Okay so now the issue is the readID isn't a valid uuid. Again I think that's a squigulator issue.

@hasindu2008 what are the readIDs you make?

The issue here is that pod5 requires the readID to be a uuid. So I can't just use any old string.

Ideally squigulator would create these and then blue-crab just reads the string and converts it.

Another option is in the absence of valid uuids I add an option to create one. But then you can't link the old reads to the new reads (unless I make a tsv file that provides the mapping).

What do you think?

@killidude
Copy link

James,

I agree, the solution is to have valid UUID and a dummy end_reason in the slow5/blow5 output generated by squigulator @hasindu2008.

Not having this also most likely breaks the butterfly-eel wrapper.

Ultimately, I need to be able to basecall the simulated slow5/blow5 files generated by squigulator so I can use the called fastq files for downstream analyses.

Thanks,

Tomas

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 10, 2024

Buttery-eel I can unbreak by using dummy uuids when i basecall and then replace the original readID when the read comes back.

The issue is going over to pod5 you can't do this because of their strict typing. So yea, either squigulator produces uuids or I create them in blue-crab and give a file that maps squigulator readIDs with uuids.

Let's see what @hasindu2008 thinks and then we will implement it asap

James

@hasindu2008
Copy link
Collaborator

Hey all,

The reason I adhere to the current readID format in squigulator is so that it is compatible with the "mapeval" utility in Minimap2's Paftools companion script. This is quite useful for assessing the mapping accuracy once the reads are basecalled. Also, I like deterministic read IDs compared to random ones.

It is very strange that POD5 needs the readid to be a UUID. Perhaps in their implementation, they simply store the UUID as a 128-bit integer instead of storing it as a variant-length string. This is not great, as this means POD5 is stuck with UUID forever as their read IDs, well, might change later and break backward compatibility. ReadID in many bioinformatics formats including BAM format has been a variable string.

Perhaps, I can implement Squigulator an option called --ont-friendly that produces some fake UUIDs for the read IDs, as well as a fake end_reason with the value "unknown". Let me know your thoughts on this. This way, there is no need for the blue crab to do any "UUIdification" of the readIDs. If you all are happy, I can implement this to squigulator ASAP.

By the way, @Psy-Fer, is this UUID thing applicable to buttery-eel too? It wasn't a problem when using ont-guppy-server with the eel. Perhaps they enforced this UUID in ont-dorado-server? If they have enforced it (which is of limited sense to me), I would be very glad if you could do some internal mapping with a fake uuid when sending to the ont-basecall-server, but write the original readID to the FASTQ/SAM.

@hasindu2008
Copy link
Collaborator

Also cross-referencing to the issue in squigulator that raises the same issue: hasindu2008/squigulator#13

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 11, 2024

Hey,

Okay I'll just make absolutely sure what pod5 is doing so we are 100% correct when we do this.

James

@killidude
Copy link

@hasindu2008 and @Psy-Fer

Perhaps, I (@hasindu2008) can implement Squigulator an option called --ont-friendly that produces some fake UUIDs for the read IDs, as well as a fake end_reason with the value "unknown". Let me know your thoughts on this. This way, there is no need for the blue crab to do any "UUIdification" of the readIDs. If you all are happy, I can implement this to squigulator ASAP.

I think this is a great solution that will maintain maximum compatibility for downstream use.

Thanks,

Tomas

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 12, 2024

Okay I have confirmed that pod5 requires a uuid type for the readID, even though it shouldn't have to be.

--s2p--
verbose=1
-------------------blue-crab version-------------------
SLOW5/BLOW5 <-> POD5 converter version: 0.1.0

-------------------testcase:1: .slow5 to .pod5-------------------
12-Apr-24 17:35:30 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: ./test//data/out/s2p/a.pod5
12-Apr-24 17:35:30 - blue-crab - [INFO]: Opening s/blow5 file: ./test//data/raw/s2p/a.slow5
12-Apr-24 17:35:30 - pyslow5 - [WARNING]: get_header_value header value not found: ip_address - rg: 0
12-Apr-24 17:35:30 - pyslow5 - [WARNING]: get_header_value header value not found: mac_address - rg: 0
Traceback (most recent call last):
  File "/home/jamfer/pvenv/blue-crab-test/bin/blue-crab", line 8, in <module>
    sys.exit(main())
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 1561, in main
    slow52pod5(args)
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 713, in slow52pod5
    s2s_s2p_worker(args, sfile, pod5_out)
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/src/blue_crab.py", line 1392, in s2s_s2p_worker
    writer.add_read(read)
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 256, in add_read
    self.add_reads([read])
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 292, in add_reads
    *self._prepare_add_reads_args(reads),
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 306, in _prepare_add_reads_args
    [np.frombuffer(read.read_id.bytes, dtype=np.uint8) for read in reads]
  File "/home/jamfer/pvenv/blue-crab-test/lib/python3.8/site-packages/pod5/writer.py", line 306, in <listcomp>
    [np.frombuffer(read.read_id.bytes, dtype=np.uint8) for read in reads]
AttributeError: 'str' object has no attribute 'bytes'
testcase 1 failed

This is what happens if we just parse a str

it's trying to access the bytes method on the uuid type specifically, as that is what they expect.

So yea, I think we need to go with dummy uuids, and just make a tsv file that maps the uuid with the more verbose read information you want to store.

James

@hasindu2008
Copy link
Collaborator

@Psy-Fer I am implementing an option in squigulator to generate uuids for readids, so blue-crab does not need to do anything.

Please check if the buttery-eel is also broken due to this uuid thing?

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 12, 2024

Buttery-eel should be fine, unless they change something in the dorado server code
Psy-Fer/buttery-eel#32
I use to think it was an issue, but turned out it was just a change in how dorado-server handles reads that are too short.

@Psy-Fer
Copy link
Owner

Psy-Fer commented Apr 12, 2024

I should probably merge the buttery-eel/skipped branch into main and do a release to handle this.

@hasindu2008
Copy link
Collaborator

@killidude

If you compile squigulator from the dev branch, and specify the option --ont-friendly=yes it should be pod5 conversion compatible.

When you specify --ont-friendly=yes it will add a dummy end_reason and create fake UUID for read IDs so.

If you encounter issues let me know, thanks.

Seems like buttery-eel works even without things being uuid as James mentioned above.

@killidude
Copy link

James,

I agree, the solution is to have valid UUID and a dummy end_reason in the slow5/blow5 output generated by squigulator @hasindu2008.

Not having this also most likely breaks the butterfly-eel wrapper.

Ultimately, I need to be able to basecall the simulated slow5/blow5 files generated by squigulator so I can use the called fastq files for downstream analyses.

Thanks,

Tomas

@killidude
Copy link

@hasindu2008,

Thanks for implementing this option. I can now convert the squigulator generated files to pod5.

Thanks for your help,

Tomas

@denisbeslic
Copy link

Hi @Psy-Fer ,
I'm using squigulator (v0.4.0) with the --ont-friendly=yes parameter and blue-crab (v0.2.0):
The error occurs during the conversion of a squigulator .slow5 file to .pod5. Here’s the error traceback:

04-Oct-24 16:12:25 - blue-crab - [INFO]: single2single: 1 s/blow5 file detected as input. Writing 1:1 s/blow5->pod5 to file: test.pod5
04-Oct-24 16:12:25 - blue-crab - [INFO]: Opening s/blow5 file: squigulator_reads.slow5
Traceback (most recent call last):
  File "/X.local/bin/blue-crab", line 8, in <module>
    sys.exit(main())
  File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 1562, in main
    slow52pod5(args)
  File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 717, in slow52pod5
    s2s_s2p_worker(args, sfile, pod5_out)
  File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 1195, in s2s_s2p_worker
    reason, forced = s2p_end_reason_convert(s5_end_reason)
  File "/X/.local/lib/python3.8/site-packages/src/blue_crab.py", line 94, in s2p_end_reason_convert
    "api_request": (p5.EndReasonEnum.API_REQUEST, False),
  File "/usr/lib/python3.8/enum.py", line 384, in __getattr__
    raise AttributeError(name) from None
AttributeError: API_REQUEST

I suspect this issue might be related to a recent pull request based on the new pod5 spec from about a month ago. Is there a way to avoid this error?

@Psy-Fer
Copy link
Owner

Psy-Fer commented Oct 4, 2024

Hmm..make sure you have the latest pod5 version?

Which version do you have? Please do a pip list for me?

@denisbeslic
Copy link

Thank you for the fast answer, upgrading pod5 fixed the problem!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants