Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assorted bugs and possibly undefined behavior in closest #167

Open
endrebak opened this issue Jul 16, 2023 · 0 comments
Open

Assorted bugs and possibly undefined behavior in closest #167

endrebak opened this issue Jul 16, 2023 · 0 comments
Labels

Comments

@endrebak
Copy link

I am using property testing in hypothesis to ensure that poranges and bioframe return the exact same results.

This has led me to discover many trifling but annoying bugs.

  1. When no closest interval is found it throws:
df = bioframe.from_any([['chr1', 100, 110]], name_col='chrom')
bf.closest(df, df.copy(), ignore_overlaps=True)
~/anaconda3/lib/python3.8/site-packages/bioframe/core/arrops.py in closest_intervals(starts1, ends1, starts2, ends2, k, tie_arr, ignore_overlaps, ignore_upstream, ignore_downstream, direction)
    734     interval1_run_starts = interval1_run_borders[:-1]
    735     interval1_run_ends = interval1_run_borders[1:]
--> 736     closest_ids = closest_ids[
    737         arange_multi(
    738             interval1_run_starts,

IndexError: index 0 is out of bounds for axis 0 with size 0

Suggested solution (this is how you handle the case where df2 has no overlapping chromosomes with df1):

  chrom  start  end chrom_  start_  end_  distance
0  chr1    100  110   <NA>    <NA>  <NA>      <NA>
  1. bf.closest does not handle empty dataframes:
df2 = pd.DataFrame({c: pd.Series([], dtype=t) for c, t in df.dtypes.items()})
bf.closest(df2, df)
~/anaconda3/lib/python3.8/site-packages/bioframe/ops.py in _closest_intidxs(df1, df2, k, ignore_overlaps, ignore_upstream, ignore_downstream, direction_col, tie_breaking_col, cols1, cols2)
   1020
   1021     if len(closest_intidxs) == 0:
-> 1022         return np.ndarray(shape=(0, 2), dtype=np.int)
   1023     closest_intidxs = np.vstack(closest_intidxs)
   1024

~/anaconda3/lib/python3.8/site-packages/numpy/__init__.py in __getattr__(attr)
    282             return Tester
    283
--> 284         raise AttributeError("module {!r} has no attribute "
    285                              "{!r}".format(__name__, attr))
    286

AttributeError: module 'numpy' has no attribute 'int'

Suggested solution: return an empty dataframe with the columns from df2 added.


This isn't critical, but it would be nice if you could fix this eventually. Hypothesis ends the testing at the first error found so these bugs prevent me from doing proper testing.

I made the title general because I might update the issue with more bugs as I find them.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants