Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Musicbrainz reduce memory used by processing by chunks #165

Open
wants to merge 47 commits into
base: main
Choose a base branch
from
Open
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
aca1887
refactor: more ignored keys
Yueqiao12Zhang Aug 19, 2024
110263b
refactor: avoid new allocation
Yueqiao12Zhang Aug 19, 2024
f9d2091
doc: explain if statement
Yueqiao12Zhang Aug 19, 2024
f72eb1e
refactor: extract and convert in chunks
Yueqiao12Zhang Aug 19, 2024
48adb26
fix: write header
Yueqiao12Zhang Aug 19, 2024
87d9fe5
refactor: if not first level, then don't extract name
Yueqiao12Zhang Aug 19, 2024
6659d28
refactor: refresh values list
Yueqiao12Zhang Aug 19, 2024
e79de48
Update convert_to_csv.py
Yueqiao12Zhang Aug 20, 2024
6c565be
Revert "Update convert_to_csv.py"
Yueqiao12Zhang Aug 20, 2024
088c40d
refactor: read jsonl in chunks
Yueqiao12Zhang Aug 20, 2024
9ff292f
Merge branch 'main' into musicbrainz-reduce-memory-used
Yueqiao12Zhang Aug 21, 2024
2cbea9b
test: add print tests
Yueqiao12Zhang Aug 21, 2024
7ae3914
Merge branch 'musicbrainz-reduce-memory-used' of https://github.com/D…
Yueqiao12Zhang Aug 21, 2024
04b3c41
fix: delete buggy code
Yueqiao12Zhang Aug 22, 2024
8203c71
style: delete print test statements
Yueqiao12Zhang Aug 22, 2024
6e51f96
fix: memory bug and header writting
Yueqiao12Zhang Aug 23, 2024
e23ed58
Merge branch 'musicbrainz-reduce-memory-used' of https://github.com/D…
Yueqiao12Zhang Aug 23, 2024
740d365
Merge branch 'main' into musicbrainz-reduce-memory-used
Yueqiao12Zhang Aug 23, 2024
c0d1c78
Merge branch 'main' into musicbrainz-reduce-memory-used
Yueqiao12Zhang Aug 23, 2024
6c6f920
test: delete unused test files
candlecao Aug 23, 2024
99ff276
Update genre.csv
Yueqiao12Zhang Aug 23, 2024
e757df8
fix: header problem by making a temp csv
candlecao Aug 23, 2024
805c464
Update mapping.json
candlecao Aug 30, 2024
81f34e7
mapping: full columns mapping
candlecao Aug 30, 2024
92bfaf1
mapping: fill the empty mappings for updated CSVs
Yueqiao12Zhang Aug 30, 2024
581e86a
refactor: ignore iso codes since they are duplicated info
Yueqiao12Zhang Aug 30, 2024
6422088
refactor: recognize special math character
Yueqiao12Zhang Aug 30, 2024
89371de
doc: add made-up URLs
Yueqiao12Zhang Aug 30, 2024
e93147e
optimize: simplify input of convert_to_csv.py
Yueqiao12Zhang Aug 30, 2024
8dc8814
fix: syntax error
Yueqiao12Zhang Aug 30, 2024
397e95e
fix: entity type is the last part of file path
Yueqiao12Zhang Aug 30, 2024
0759de7
fix: output pathname
candlecao Aug 30, 2024
10f62c2
doc: update manual based on changes that removed commandline arguments
Yueqiao12Zhang Aug 30, 2024
738a7ed
Merge branch 'musicbrainz-reduce-memory-used' of https://github.com/D…
candlecao Aug 30, 2024
ceffbf1
Merge branch 'musicbrainz-reduce-memory-used' of https://github.com/D…
Yueqiao12Zhang Aug 30, 2024
66133e8
style: delete unused test code
Yueqiao12Zhang Aug 30, 2024
7cb07d0
doc: update docstring input to the correct number
Yueqiao12Zhang Aug 30, 2024
3f61a23
refactor: use writer instead of f.write()
Yueqiao12Zhang Aug 30, 2024
5a1783b
Revert "Update mapping.json"
Yueqiao12Zhang Aug 30, 2024
d47751e
doc: style update according to GPT
Yueqiao12Zhang Aug 30, 2024
3bad21e
doc: add specification for chunk_size
Yueqiao12Zhang Aug 30, 2024
6231da5
refactor: change list to set
Yueqiao12Zhang Sep 6, 2024
29dc943
fix: filename bug, readlines bug, opening wrong file bug
Yueqiao12Zhang Sep 6, 2024
28243a5
fix: auto escape by using \t not comma
Yueqiao12Zhang Sep 9, 2024
37201ab
feat: add quotechar to distinguish ", and , separator
Yueqiao12Zhang Sep 13, 2024
72b92e6
refactor: temp should be a tsv
Yueqiao12Zhang Sep 13, 2024
b64da8a
fix: correct a few temp.csv error
Yueqiao12Zhang Sep 13, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions musicbrainz/csv/convert_to_csv.py
Original file line number Diff line number Diff line change
Expand Up @@ -195,7 +195,7 @@ def convert_dict_to_csv(dictionary_list: list) -> None:
with open(
"temp.csv", mode="a", newline="", encoding="utf-8"
) as csv_records:
writer_records = csv.writer(csv_records)
writer_records = csv.writer(csv_records, delimiter="\t")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is no longer comma-separated now... it's a different format.

writer_records.writerow(row)


Expand Down Expand Up @@ -233,6 +233,6 @@ def convert_dict_to_csv(dictionary_list: list) -> None:
writer.writerow(header)

for line in f_temp.readlines():
writer.writerow(line.strip().split(","))
writer.writerow(line.strip().split("\t"))

os.remove("temp.csv")