Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Datetime place format recognition #159

Merged
merged 54 commits into from
Oct 25, 2024
Merged
Changes from 4 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
cf33751
refactor: reads geographic and datetime
Yueqiao12Zhang Aug 20, 2024
446a43d
test: update based on change
Yueqiao12Zhang Aug 20, 2024
f918248
Delete tunes.csv
Yueqiao12Zhang Aug 20, 2024
7a676f4
Merge branch 'main' into datetime-place-format-recognition
Yueqiao12Zhang Aug 22, 2024
76a8e37
Merge branch 'main' into datetime-place-format-recognition
Yueqiao12Zhang Aug 23, 2024
212d343
refactor: complie pattern matching csv2rdf/csv2rdf_single_subject.py
Yueqiao12Zhang Aug 23, 2024
1991475
refactor: compile pattern csv2rdf/csv2rdf_single_subject.py
Yueqiao12Zhang Aug 23, 2024
88eb0ac
fix: reconciliation add missing countries
Yueqiao12Zhang Aug 30, 2024
fca03ee
Merge branch 'main' into datetime-place-format-recognition
Yueqiao12Zhang Aug 30, 2024
7aaaa8b
merge: fix conflict
Yueqiao12Zhang Aug 30, 2024
8fd007e
gitignore: ignore output since it's too large
Yueqiao12Zhang Aug 30, 2024
86a6aaa
refactor: change output coordinate type
Yueqiao12Zhang Aug 30, 2024
bf0fa91
style: correct syntax
Yueqiao12Zhang Aug 30, 2024
47f2af4
test: correct coordinate, todo: correct datetime
Yueqiao12Zhang Aug 30, 2024
ce6a873
test: update datetime format and empty coordinate
Yueqiao12Zhang Sep 6, 2024
094a3b0
test: change the openrefine history json based on new format change
Yueqiao12Zhang Sep 6, 2024
7f584c0
doc: manual update the procedure for the session reconciling
Yueqiao12Zhang Sep 6, 2024
c9969cb
doc: fix text transform code
Yueqiao12Zhang Sep 6, 2024
ce79903
test: change the openrefine history for updated text transform code
Yueqiao12Zhang Sep 6, 2024
bd34b49
test: events.csv remove empty point coordinate
Yueqiao12Zhang Sep 6, 2024
7fcf81f
test: move to csv2rdf folder for testing
Yueqiao12Zhang Sep 6, 2024
f9f226c
feat: correctly checks for digits, and ignore digits in artist name
Yueqiao12Zhang Sep 6, 2024
e860302
test: add lang tag
Yueqiao12Zhang Sep 6, 2024
76c5ded
fix: correct find_artist filename
Yueqiao12Zhang Sep 6, 2024
9847d33
feat: use python datetime to add day of the week to the rdf
Yueqiao12Zhang Sep 6, 2024
be7b076
test: add day of the week
Yueqiao12Zhang Sep 6, 2024
a937458
Revert "test: add day of the week"
Yueqiao12Zhang Sep 6, 2024
e1a1ff6
feat: use datetime obj to recognize and reformat the datetime string
Yueqiao12Zhang Sep 6, 2024
f40aee7
test: remove the history that adds the "T" in datetime
Yueqiao12Zhang Sep 6, 2024
ce97fa5
test: "T" in datetime removed
Yueqiao12Zhang Sep 6, 2024
0e1da14
refactor: create a set containing string type columns
Yueqiao12Zhang Sep 6, 2024
19166c1
test: full output
Yueqiao12Zhang Sep 6, 2024
c3b5650
refactor: remove lang tag
candlecao Sep 9, 2024
1a791bc
Update csv2rdf_single_subject.py
candlecao Sep 9, 2024
524638a
Update out_rdf.ttl
candlecao Sep 9, 2024
28f3698
Modify the property value for "artist" in mapping.json
candlecao Sep 11, 2024
48d8d29
Update out_rdf.ttl
candlecao Sep 11, 2024
0cd3e4b
Update mapping.json
candlecao Sep 12, 2024
9012a48
Update recordings-csv.csv
candlecao Sep 12, 2024
24b275f
Create recordings-csv_onlyForArtist_wiki.csv
candlecao Sep 12, 2024
02fc059
Update mapping.json
candlecao Sep 12, 2024
486d3f7
Merge branch 'datetime-place-format-recognition' of https://github.co…
candlecao Sep 12, 2024
e4508dc
style: format update
Yueqiao12Zhang Sep 20, 2024
d297532
Update reconcile_procedures.md
candlecao Sep 20, 2024
3f62dd7
fix: remove incorrect exponent form
Yueqiao12Zhang Sep 20, 2024
5fe16c9
Merge branch 'datetime-place-format-recognition' of https://github.co…
Yueqiao12Zhang Sep 20, 2024
76af8e4
Update csv2rdf_single_subject.py
candlecao Oct 21, 2024
824f14b
Update out_rdf.ttl
candlecao Oct 21, 2024
47f878b
Update recordings-csv.csv
candlecao Oct 21, 2024
a545edd
Delete recordings-csv_onlyForArtist_wiki.csv
candlecao Oct 21, 2024
1bd076a
Create recordings-csv_onlyForArtist_wiki.csv
candlecao Oct 21, 2024
55a74b6
Delete recordings-csv_onlyForArtist_wiki.csv
candlecao Oct 25, 2024
2ebfae2
Merge branch 'main' into datetime-place-format-recognition
candlecao Oct 25, 2024
5e50091
Merge branch 'main' into datetime-place-format-recognition
candlecao Oct 25, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 8 additions & 2 deletions csv2rdf/csv2rdf_single_subject.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,15 +17,17 @@
import json
import os
import validators
import re
from rdflib import Graph, URIRef, Literal
from rdflib.namespace import RDF, XSD
from rdflib.namespace import RDF, XSD, WGS

# The "type" attribute of each CSV file must be entered in the mapper file in the
# same order as the input in commandline.

DIRNAME = os.path.dirname(__file__)
mapping_filename = os.path.join(DIRNAME, sys.argv[1])
dest_filename = os.path.join(os.path.dirname(mapping_filename), "out_rdf.ttl")
DT_PATTERN = r'^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}$'
Yueqiao12Zhang marked this conversation as resolved.
Show resolved Hide resolved


def convert_csv_to_turtle(filenames: List[str]) -> Graph:
Expand Down Expand Up @@ -84,6 +86,10 @@ def convert_csv_to_turtle(filenames: List[str]) -> Graph:
obj = Literal(element, datatype=XSD.boolean)
elif element.isnumeric():
obj = Literal(element, datatype=XSD.integer)
elif element.startswith("Point("):
obj = Literal(element[5:], datatype=WGS.Point)
dchiller marked this conversation as resolved.
Show resolved Hide resolved
elif re.match(DT_PATTERN, element):
Yueqiao12Zhang marked this conversation as resolved.
Show resolved Hide resolved
obj = Literal(element, datatype=XSD.dateTime)
else:
obj = Literal(element)

Expand All @@ -98,4 +104,4 @@ def convert_csv_to_turtle(filenames: List[str]) -> Graph:

fns = sys.argv[2:]
turtle_data = convert_csv_to_turtle(fns)
turtle_data.serialize(format="turtle", destination=dest_filename)
turtle_data.serialize(format="turtle", destination=dest_filename,)