-
Notifications
You must be signed in to change notification settings - Fork 1.4k
Closed
Labels
good first issueGood for newcomersGood for newcomers
Description
Describe the bug
CSVDataset accepts pandas DataFrames as input for src. But it makes assumptions about the index.
This is because convert_tables_to_dicts uses .loc instead of .iloc. It generates ordinal indexes to subset on but treats them as names indices.
Line 1494 in 0bb20a8
| data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names] |
To Reproduce
import numpy
import pandas
import monai
df = pandas.DataFrame(numpy.random.random((50, 3)))
df_subset = df.iloc[numpy.arange(0, 50, 5)]
print(df_subset.shape) # (10, 3)
ds = monai.data.CSVDataset(df_subset)
print(len(ds)) # 3
Expected behavior
print(len(ds)) should return 10.
It returns 3 because it looks up indices slice(10), which match indices 0, 5 and 10 from the subset.
Environment
Shouldn't be relevant?
Additional context
Simple fix:
Line 1494 in 0bb20a8
| data_ = df.loc[rows] if col_names is None else df.loc[rows, col_names] |
The first .loc should be .iloc, and the second should be .iloc[rows][col_names]
Metadata
Metadata
Assignees
Labels
good first issueGood for newcomersGood for newcomers