Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

parquet instead of pickle? #5

Open
aminnj opened this issue Feb 11, 2021 · 1 comment
Open

parquet instead of pickle? #5

aminnj opened this issue Feb 11, 2021 · 1 comment

Comments

@aminnj
Copy link

aminnj commented Feb 11, 2021

Sorry for being nosy - I am curious about all the columnar technologies in an actual analysis context like Hgg :)

I saw that pickle is mentioned/used in several places. You might try parquet instead (df.to_parquet(), pd.read_parquet()). It's essentially the industry version of ROOT, so it'll be much faster at serializing/deserializing than pickle. Pickle is probably faster without compression, but if you use df.to_pickle("blah.pkl.gz"), parquet will be faster. Of course this all depends on how big your files are.

@sam-may
Copy link
Collaborator

sam-may commented Feb 12, 2021

I appreciate the nosiness!

I was using pickle pretty naively. I'll try this out and let you know how it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants