We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
nrows
benchmark_dataframe.py
We currently use nrows for parameterizing reading dataframes which might skew results for smaller files in test_Read_CSV.
dataframes
test_Read_CSV
pybench/pybench/benchmarks/benchmark_dataframe.py
Line 16 in 89d65a6
In the current cudf implementation, with nrows i think we are parsing the entire file in gpu memory to find line terminators (and quote characters).
cudf
This might skew our results for reading with nrows so we might want to change it.
See comment : rapidsai/cudf#1643 (comment) and rapidsai/cudf#1643 (comment) on issue: rapidsai/cudf#1643
On the below tests i found quite a bit of performance delta, (72.8 ms vs 1.6 s)
72.8 ms
1.6 s
import cudf !head -n 100001 '/datasets/nyc_taxi/2015/yellow_tripdata_2015-01.csv' > 'yellow_tripdata_2015-01_head_100k.csv'
%timeit df = cudf.read_csv('yellow_tripdata_2015-01_head_100k.csv',nrows = 100_000)
72.8 ms ± 28.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
%timeit df = cudf.read_csv('/datasets/nyc_taxi/2015/yellow_tripdata_2015-01.csv',nrows = 100_000)
1.6 s ± 45.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
The text was updated successfully, but these errors were encountered:
No branches or pull requests
nrows
in benchmark_dataframe.py might skew results for test_Read_CSVWe currently use
nrows
for parameterizing readingdataframes
which might skew results for smaller files intest_Read_CSV
.pybench/pybench/benchmarks/benchmark_dataframe.py
Line 16 in 89d65a6
In the current
cudf
implementation, withnrows
i think we are parsing the entire file in gpu memory to find line terminators (and quote characters).This might skew our results for reading with
nrows
so we might want to change it.See comment : rapidsai/cudf#1643 (comment) and rapidsai/cudf#1643 (comment) on issue: rapidsai/cudf#1643
On the below tests i found quite a bit of performance delta, (
72.8 ms
vs1.6 s
)Take head of the file for reading:
Timing on reading from a small file file:
Timing on reading the whole file:
The text was updated successfully, but these errors were encountered: