Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot open the file: MusicBrainz releases #155

Closed
Yueqiao12Zhang opened this issue Aug 19, 2024 · 10 comments · May be fixed by #165
Closed

Cannot open the file: MusicBrainz releases #155

Yueqiao12Zhang opened this issue Aug 19, 2024 · 10 comments · May be fixed by #165
Assignees
Labels
Priority: high high priority

Comments

@Yueqiao12Zhang
Copy link
Contributor

Yueqiao12Zhang commented Aug 19, 2024

The csv2rdf script takes too much memories (the kernel returns Killed: 9 after running the program for a long time), when working with the releases data dumps.

@dchiller
Copy link
Contributor

dchiller commented Aug 22, 2024

Could you post the traceback here? Is this the memory issue we talked about in the lab meeting?

@Yueqiao12Zhang
Copy link
Contributor Author

Yes. I will copy the error message from the lab Mac after the meeting.

@Yueqiao12Zhang
Copy link
Contributor Author

Yueqiao12Zhang commented Aug 22, 2024

CHUNK_SIZE = 10000

if __name__ == "__main__":

    # the file must be from MusicBrainz's JSON data dumps.
    chunk = []

    with open(inputpath, "r", encoding="utf-8") as f:
        print(1) # It does not reach this line.
        for line in f:
            line_data = json.loads(line)  # Parse each line as a JSON object
            chunk.append(line_data)  # Add the JSON object to the current chunk

            # When the chunk reaches the desired size, process it
            if len(chunk) == CHUNK_SIZE:
                print(2)
                extract(chunk, {})
                chunk.clear()  # Reset the chunk
                convert_dict_to_csv(values)

            values.clear()

        # Process any remaining data in the last chunk
        if chunk:
            extract(chunk, {})
            chunk.clear()
            convert_dict_to_csv(values)

It works fine with smaller files.

@Yueqiao12Zhang
Copy link
Contributor Author

Yueqiao12Zhang commented Aug 22, 2024

(local) DDMAL-004:csv local$ python3 convert_to_csv.py ../data/raw/extracted_jsonl/mbdump/release release
[1]+  Terminated: 15          python3 convert_to_csv.py ../data/raw/extracted_jsonl/mbdump/release release
Killed: 9

@Yueqiao12Zhang
Copy link
Contributor Author

Python does not return an error message.

@dchiller
Copy link
Contributor

And so you know the place where it is erroring because it doesn't reach the print statement?

@dchiller
Copy link
Contributor

Where is the python file you are running?

@Yueqiao12Zhang Yueqiao12Zhang changed the title Killed 9 when running csv2rdf for MusicBrainz releases dumps Cannot open the file: MusicBrainz releases Aug 22, 2024
@Yueqiao12Zhang
Copy link
Contributor Author

And so you know the place where it is erroring because it doesn't reach the print statement?

Yes,

Where is the python file you are running?

/csv2rdf/csv2rdf_single_subject.py

@Yueqiao12Zhang
Copy link
Contributor Author

Problem solved! Very dump mistake 😭

@fujinaga
Copy link
Member

Problem solved! Very dump mistake 😭

Great! No issues, we all make silly coding errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Priority: high high priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants