Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error in Clustering mode: Invalid clustering mode: 3 #8

Open
CWYuan08 opened this issue Apr 10, 2023 · 9 comments
Open

Error in Clustering mode: Invalid clustering mode: 3 #8

CWYuan08 opened this issue Apr 10, 2023 · 9 comments

Comments

@CWYuan08
Copy link

Hi, I am trying to run isONclust2 first for isONcorrect, but I got this error for all the batches, one example:
Loaded input batch from batches/isONbatch_9.cer:
Batch number: 9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 50001212
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Created pseudo-batch for single clustering:
Batch number: -9
Batch range: [244492,273799]
Depth: -1
Nr sequences: 29308
Nr bases: 0
Nr clusters: 29308
Nr nontrivial clusters: 0
Minimizers in database: 0
Resetting input clusters.
Clustering mode: Invalid clustering mode: 3

from running:
for f in batches/isONbatch_.cer; do
filename=$(basename "$f")
output="clustered/${filename%.
}.cer"
isONclust2 cluster -v -l "$f" -o "$output"
done

could you please advise what I need to fix?

Many thanks!!

Best,
CW

@ksahlin
Copy link

ksahlin commented Apr 10, 2023

Hi @CWYuan08,

Since I was the one referring you here.. This seems to be the way to run isONclust2: https://github.com/epi2me-labs/wf-transcriptomes (and this section in particular: https://github.com/epi2me-labs/wf-isoforms#de-novo-based-approach-experimental)

You can then run isONcorrect on the clustered output, and isONform for consensus. I have not tried the approach they listed here, but they say it is experimental, which typically means no substantial benchmarks have been done.

Best,
K

@Johnsonzcode
Copy link

Johnsonzcode commented Apr 12, 2023

Hi @ksahlin
But de-novo-based-approach-experimental cannot be runned on command line mode. You can see here.
So maybe this pipeline isONclust-isONcorrect-isONform is the only way ?

@cjw85
Copy link
Member

cjw85 commented Apr 12, 2023

(Thanks @ksahlin for adding some comments here).

@Johnsonzcode the de-novo based approaches are indeed still largely experimental and so the code is not well-maintained. This project is not currently maintained and there is no one at Oxford Nanopore Technologies currently studying de-novo approaches. I dare say that @ksahlin is far more of an expert in the space than we are.

@Johnsonzcode
Copy link

So How could I get non-redundant isoform from ONT full-length transcripts. Is there some pipeline suggested ?

@Johnsonzcode
Copy link

(Thanks @ksahlin for adding some comments here).

@Johnsonzcode the de-novo based approaches are indeed still largely experimental and so the code is not well-maintained. This project is not currently maintained and there is no one at Oxford Nanopore Technologies currently studying de-novo approaches. I dare say that @ksahlin is far more of an expert in the space than we are.

But how could I sovle the error as mentioned? Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

@ksahlin
Copy link

ksahlin commented Apr 12, 2023

Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

I can suggest running pychopper-isONclust-isONcorrect-isONform for this. The problen is that isONclust does not scale to very large datasets. This is what @CWYuan08 noticed and, hence, we ended up here looking for isONclust2 to replace isONclust as a solution. Another way is to manually batch (i.e. split) your large dataset to independent instances that isONclust can run on.

@Johnsonzcode
Copy link

Johnsonzcode commented Apr 13, 2023

Or is there some pipeline suggested to get non-redundant isoform from ONT full-length transcriptome sequencing ?

I can suggest running pychopper-isONclust-isONcorrect-isONform for this. The problen is that isONclust does not scale to very large datasets. This is what @CWYuan08 noticed and, hence, we ended up here looking for isONclust2 to replace isONclust as a solution. Another way is to manually batch (i.e. split) your large dataset to independent instances that isONclust can run on.

This pipeline may work.

@cjw85
Copy link
Member

cjw85 commented Apr 13, 2023

@Johnsonzcode

The https://github.com/epi2me-labs/wf-isoforms pipeline is deprecated and its functionality is folded in to wf-transcriptomes. If you wish to use the de-novo route through wf-transcriptomes, lets work to uncover the bug you are seeing with its use on the issue you have already started over there. I feel we've gone a bit off topic from @CWYuan08's original post here.

@CWYuan08
Copy link
Author

Dear @Johnsonzcode, @cjw85, @ksahlin,
thank you all for the useful discussions here, this is what I would like to ask and follow too!
Best,
CW

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants