You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Here is a reprex of a pair of SNPs that I am trying to predict using Borzoi. The prediction interval of the first is contained within the chr, but not the 2nd.
Hi, I have a similar quesetion here. I am using the borzoi and grelu.data.dataset.AnnDataSeqDataset. I have checked that my datasets contain the gene with all seq len as 524288, but stilll receive the assertion error:
File /home/tl688/.conda/envs/evo/lib/python3.11/site-packages/grelu/sequence/format.py:414, in convert_input_type(inputs, output_type, genome, add_batch_axis)
412 return strings_to_one_hot(inputs, add_batch_axis=add_batch_axis)
413 elif output_type == "indices":
--> 414 return strings_to_indices(inputs, add_batch_axis=add_batch_axis)
416 # Convert indices
417 if input_type == "indices":
File /home/tl688/.conda/envs/evo/lib/python3.11/site-packages/grelu/sequence/format.py:251, in strings_to_indices(strings, add_batch_axis)
247 return arr
249 # Convert multiple sequences; they must all have equal length
250 else:
--> 251 assert check_equal_lengths(
252 strings
253 ), "All input sequences must have the same length."
254 return np.stack(
255 [[BASE_TO_INDEX_HASH[base] for base in string] for string in strings]
256 ).astype(np.int8)
AssertionError: All input sequences must have the same length.
Update: the error I was reporting is actually independent of sequence length / extending beyond chromosome boundary.
I just walked the call stack for grelu.data.dataset.VariantDataset() and actually the source of my error is in VariantDataset()._load_alleles(), which internally calls grelu.sequence.format.strings_to_indices(). Basically, internally it is expected that I am loading alleles of a common length (presumably, this is expected to be a SNP, rather than indel).
Here is an updated reprex:
from grelu.sequence.format import strings_to_indices
alleles = ['A', 'C', 'AC', 'G']
strings_to_indices(alleles)
## AssertionError: All input sequences must have the same length.
## VERSUS
alleles = ['A', 'C', 'G']
strings_to_indices(alleles)
## array([[0],
## [1],
## [2]], dtype=int8)
So the simple fix for me as a user is just to focus on SNPs, at least for now.
Hi gReLU team,
Here is a reprex of a pair of SNPs that I am trying to predict using Borzoi. The prediction interval of the first is contained within the chr, but not the 2nd.
This gives the error
It would be great for this to be caught automatically somehow :)
The text was updated successfully, but these errors were encountered: