Assorted bugs and possibly undefined behavior in closest #167

endrebak · 2023-07-16T10:57:20Z

I am using property testing in hypothesis to ensure that poranges and bioframe return the exact same results.

This has led me to discover many trifling but annoying bugs.

When no closest interval is found it throws:

df = bioframe.from_any([['chr1', 100, 110]], name_col='chrom')
bf.closest(df, df.copy(), ignore_overlaps=True)
~/anaconda3/lib/python3.8/site-packages/bioframe/core/arrops.py in closest_intervals(starts1, ends1, starts2, ends2, k, tie_arr, ignore_overlaps, ignore_upstream, ignore_downstream, direction)
    734     interval1_run_starts = interval1_run_borders[:-1]
    735     interval1_run_ends = interval1_run_borders[1:]
--> 736     closest_ids = closest_ids[
    737         arange_multi(
    738             interval1_run_starts,

IndexError: index 0 is out of bounds for axis 0 with size 0

Suggested solution (this is how you handle the case where df2 has no overlapping chromosomes with df1):

  chrom  start  end chrom_  start_  end_  distance
0  chr1    100  110   <NA>    <NA>  <NA>      <NA>

bf.closest does not handle empty dataframes:

df2 = pd.DataFrame({c: pd.Series([], dtype=t) for c, t in df.dtypes.items()})
bf.closest(df2, df)
~/anaconda3/lib/python3.8/site-packages/bioframe/ops.py in _closest_intidxs(df1, df2, k, ignore_overlaps, ignore_upstream, ignore_downstream, direction_col, tie_breaking_col, cols1, cols2)
   1020
   1021     if len(closest_intidxs) == 0:
-> 1022         return np.ndarray(shape=(0, 2), dtype=np.int)
   1023     closest_intidxs = np.vstack(closest_intidxs)
   1024

~/anaconda3/lib/python3.8/site-packages/numpy/__init__.py in __getattr__(attr)
    282             return Tester
    283
--> 284         raise AttributeError("module {!r} has no attribute "
    285                              "{!r}".format(__name__, attr))
    286

AttributeError: module 'numpy' has no attribute 'int'

Suggested solution: return an empty dataframe with the columns from df2 added.

This isn't critical, but it would be nice if you could fix this eventually. Hypothesis ends the testing at the first error found so these bugs prevent me from doing proper testing.

I made the title general because I might update the issue with more bugs as I find them.

The text was updated successfully, but these errors were encountered:

gfudenberg added the bug label Dec 20, 2023

harshit148 mentioned this issue Mar 10, 2024

#167: Replaced np.int with int as the attribute is deprecated by numpy #192

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Assorted bugs and possibly undefined behavior in closest #167

Assorted bugs and possibly undefined behavior in closest #167

endrebak commented Jul 16, 2023

Assorted bugs and possibly undefined behavior in closest #167

Assorted bugs and possibly undefined behavior in closest #167

Comments

endrebak commented Jul 16, 2023