Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for variable length strings #62

Open
dragly opened this issue Nov 8, 2018 · 1 comment
Open

Add support for variable length strings #62

dragly opened this issue Nov 8, 2018 · 1 comment

Comments

@dragly
Copy link
Member

dragly commented Nov 8, 2018

Currently, structured NumPy arrays work just fine in our Python API, although they might not be supported when reading the data back in other languages. This means that data from for instance Pandas can be saved and loaded using Pandas.to_records().

However, we do not support variable length strings, because these appear as objects in the dtype, and hence become object arrays, which are not allowed (see #47) because they need to be pickled.

We should look into ways of storing variable length strings. However, these are not trivial to implement on top of the simple NumPy format, so we might need to consider adding a different backend for this purpose. My best bet for a cross-platform and lightweight format is SQLite, but that is still a large dependency to pull in for a single feature.

@dragly
Copy link
Member Author

dragly commented Feb 4, 2019

Seems like an interesting option is to have a closer look at Apache Arrow and the Feather or Parquet implementations: https://github.com/wesm/feather https://arrow.apache.org/docs/python/parquet.html

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant