parquet instead of pickle? #5

aminnj · 2021-02-11T23:33:42Z

Sorry for being nosy - I am curious about all the columnar technologies in an actual analysis context like Hgg :)

I saw that pickle is mentioned/used in several places. You might try parquet instead (df.to_parquet(), pd.read_parquet()). It's essentially the industry version of ROOT, so it'll be much faster at serializing/deserializing than pickle. Pickle is probably faster without compression, but if you use df.to_pickle("blah.pkl.gz"), parquet will be faster. Of course this all depends on how big your files are.

The text was updated successfully, but these errors were encountered:

sam-may · 2021-02-12T22:43:51Z

I appreciate the nosiness!

I was using pickle pretty naively. I'll try this out and let you know how it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

parquet instead of pickle? #5

parquet instead of pickle? #5

aminnj commented Feb 11, 2021

sam-may commented Feb 12, 2021

parquet instead of pickle? #5

parquet instead of pickle? #5

Comments

aminnj commented Feb 11, 2021

sam-may commented Feb 12, 2021