Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Difficulty with ingesting large files #181

Open
Gpadh opened this issue Dec 1, 2023 · 0 comments
Open

Difficulty with ingesting large files #181

Gpadh opened this issue Dec 1, 2023 · 0 comments
Labels

Comments

@Gpadh
Copy link

Gpadh commented Dec 1, 2023

I'm trying to ingest a 100+GB file of legal data into a Kernel Memory service. The data I would like to access are the "opinions" files from this link (https://com-courtlistener-storage.s3-us-west-2.amazonaws.com/list.html?prefix=bulk-data/). They are zipped .bz2 files.

To ingest the data, I use azcopy to get a file into a container. Then, I have a function which triggers on file ingestion in this container. The function unzips the .bz2 file and sends it to Kernel Memory for ingestion in the form of a stream. The zipped file is about 30GB, when I unzip it the size becomes 100+GB.

This is the error message I get when I try to ingest the files into Kernel Memory:
image

The repository to repro the issue is here: https://github.com/Gpadh/KMFileIngestion/tree/master

Please let me know if I can provide any more details to help.

@dluc dluc added the triage label Dec 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants