Cannot open the file: MusicBrainz releases #155

Yueqiao12Zhang · 2024-08-19T14:12:24Z

The csv2rdf script takes too much memories (the kernel returns Killed: 9 after running the program for a long time), when working with the releases data dumps.

dchiller · 2024-08-22T14:01:58Z

Could you post the traceback here? Is this the memory issue we talked about in the lab meeting?

Yueqiao12Zhang · 2024-08-22T14:04:21Z

Yes. I will copy the error message from the lab Mac after the meeting.

Yueqiao12Zhang · 2024-08-22T14:06:02Z

CHUNK_SIZE = 10000

if __name__ == "__main__":

    # the file must be from MusicBrainz's JSON data dumps.
    chunk = []

    with open(inputpath, "r", encoding="utf-8") as f:
        print(1) # It does not reach this line.
        for line in f:
            line_data = json.loads(line)  # Parse each line as a JSON object
            chunk.append(line_data)  # Add the JSON object to the current chunk

            # When the chunk reaches the desired size, process it
            if len(chunk) == CHUNK_SIZE:
                print(2)
                extract(chunk, {})
                chunk.clear()  # Reset the chunk
                convert_dict_to_csv(values)

            values.clear()

        # Process any remaining data in the last chunk
        if chunk:
            extract(chunk, {})
            chunk.clear()
            convert_dict_to_csv(values)

It works fine with smaller files.

Yueqiao12Zhang · 2024-08-22T14:24:41Z

(local) DDMAL-004:csv local$ python3 convert_to_csv.py ../data/raw/extracted_jsonl/mbdump/release release
[1]+  Terminated: 15          python3 convert_to_csv.py ../data/raw/extracted_jsonl/mbdump/release release
Killed: 9

Yueqiao12Zhang · 2024-08-22T14:25:12Z

Python does not return an error message.

dchiller · 2024-08-22T14:48:03Z

And so you know the place where it is erroring because it doesn't reach the print statement?

dchiller · 2024-08-22T15:02:12Z

Where is the python file you are running?

Yueqiao12Zhang · 2024-08-22T15:20:03Z

And so you know the place where it is erroring because it doesn't reach the print statement?

Yes,

Where is the python file you are running?

/csv2rdf/csv2rdf_single_subject.py

Yueqiao12Zhang · 2024-08-22T15:48:46Z

Problem solved! Very dump mistake 😭

fujinaga · 2024-08-22T23:54:27Z

Problem solved! Very dump mistake 😭

Great! No issues, we all make silly coding errors.

Yueqiao12Zhang added the Priority: high high priority label Aug 19, 2024

Yueqiao12Zhang self-assigned this Aug 19, 2024

Yueqiao12Zhang mentioned this issue Aug 20, 2024

MusicBrainz large file conversion #132

Closed

Yueqiao12Zhang changed the title ~~Killed 9 when running csv2rdf for MusicBrainz releases dumps~~ Cannot open the file: MusicBrainz releases Aug 22, 2024

Yueqiao12Zhang linked a pull request Aug 22, 2024 that will close this issue

Musicbrainz reduce memory used by processing by chunks #165

Merged

Yueqiao12Zhang closed this as completed Aug 23, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cannot open the file: MusicBrainz releases #155

Cannot open the file: MusicBrainz releases #155

Yueqiao12Zhang commented Aug 19, 2024 •

edited

Loading

dchiller commented Aug 22, 2024 •

edited

Loading

Yueqiao12Zhang commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024 •

edited

Loading

Yueqiao12Zhang commented Aug 22, 2024 •

edited

Loading

Yueqiao12Zhang commented Aug 22, 2024

dchiller commented Aug 22, 2024

dchiller commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024

fujinaga commented Aug 22, 2024

Cannot open the file: MusicBrainz releases #155

Cannot open the file: MusicBrainz releases #155

Comments

Yueqiao12Zhang commented Aug 19, 2024 • edited Loading

dchiller commented Aug 22, 2024 • edited Loading

Yueqiao12Zhang commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024 • edited Loading

Yueqiao12Zhang commented Aug 22, 2024 • edited Loading

Yueqiao12Zhang commented Aug 22, 2024

dchiller commented Aug 22, 2024

dchiller commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024

Yueqiao12Zhang commented Aug 22, 2024

fujinaga commented Aug 22, 2024

Yueqiao12Zhang commented Aug 19, 2024 •

edited

Loading

dchiller commented Aug 22, 2024 •

edited

Loading

Yueqiao12Zhang commented Aug 22, 2024 •

edited

Loading

Yueqiao12Zhang commented Aug 22, 2024 •

edited

Loading