Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new: preserve_adb_keys in PyG to ArangoDB #11

Merged
merged 70 commits into from
Sep 21, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
0bf2609
initial commit
aMahanna Aug 1, 2022
42bb841
temp: create py310 database for 3.10 testing
aMahanna Aug 1, 2022
d8d151b
temp: black hack
aMahanna Aug 1, 2022
d6dab74
Update conftest.py
aMahanna Aug 1, 2022
e91cff1
Update build.yml
aMahanna Aug 1, 2022
29eeb63
cleanup build.yml & release.yml
aMahanna Aug 1, 2022
fde393d
remove: temp conftest hack
aMahanna Aug 1, 2022
acbd39c
cleanup: abc
aMahanna Aug 1, 2022
97084a7
new: optional & partial edge collectiond data transfer (ArangoDB to PyG)
aMahanna Aug 1, 2022
eef7018
fix: black
aMahanna Aug 1, 2022
f5f3f0a
temp: disable HeterogeneousPartialEdgeCollectionImport
aMahanna Aug 1, 2022
2e2d6f4
new: test_adb_partial_to_pyg
aMahanna Aug 1, 2022
1ac09cf
new: query/dataframe optimization
aMahanna Aug 1, 2022
32a234b
cleanup
aMahanna Aug 2, 2022
78b15b2
Update README.md
aMahanna Aug 2, 2022
d4a4608
Update README.md
aMahanna Aug 2, 2022
6333dd4
cleanup: `validate_adb_metagraph`
aMahanna Aug 2, 2022
b6455a5
fix: black
aMahanna Aug 2, 2022
ca09c73
Merge branch 'master' into feature/adbpyg-map
aMahanna Aug 2, 2022
2b1d1ea
initial (experimental) commit
aMahanna Aug 2, 2022
4bb66f1
checkpoint
aMahanna Aug 3, 2022
aca385a
new: lazy attempt at #4
aMahanna Aug 3, 2022
6c5ed73
new: `preserve_adb_keys` docstring
aMahanna Aug 3, 2022
0249467
new: `pytest_exception_interact`
aMahanna Aug 3, 2022
84d62b1
temp: disable (partial) feature validation in `assert_arangodb_data`
aMahanna Aug 3, 2022
f91c8f0
cleanup: adapter.py
aMahanna Aug 3, 2022
a8b8b73
move: `preserve_adb_keys`
aMahanna Aug 3, 2022
e17be93
cleanup: `pyg_keys`
aMahanna Aug 3, 2022
63ea00d
new: test cases to cover `preserve_adb_keys`
aMahanna Aug 3, 2022
f787e64
temp: `# flake8: noqa`
aMahanna Aug 3, 2022
bf14832
debug: `pytest_exception_interact`
aMahanna Aug 3, 2022
03e6ea3
temp: fix cudf to_dict error
aMahanna Aug 3, 2022
17d008d
fix: black
aMahanna Aug 3, 2022
ab41595
fix: typo
aMahanna Aug 3, 2022
847ebae
remove: `cudf` imports
aMahanna Aug 3, 2022
a9734a9
fix: `test_adb_partial_to_pyg` RNG
aMahanna Aug 3, 2022
ef135a6
cleanup: `__finish_adb_dataframe` and `__build_dataframe_from_tensor`
aMahanna Aug 3, 2022
bd384b9
fix: map typings
aMahanna Aug 4, 2022
4d01224
new: test_adapter.py refactor
aMahanna Aug 4, 2022
a49d9d3
new: `preserve_adb_keys` refactor
aMahanna Aug 4, 2022
82e02f8
debug: test `HeterogeneousTurnedHomogeneous`
aMahanna Aug 4, 2022
79ae252
cleanup: test_adapter
aMahanna Aug 4, 2022
83799a7
update docstring
aMahanna Aug 4, 2022
d9073fe
fix: flake8
aMahanna Aug 4, 2022
e83bcd8
update: docstrings
aMahanna Aug 4, 2022
011911b
fix: default param value
aMahanna Aug 4, 2022
ee40271
fix: docstring
aMahanna Aug 4, 2022
f114b9a
cleanup
aMahanna Aug 4, 2022
c09987a
fix: black
aMahanna Aug 4, 2022
c4182c0
new: Full Cycle README section
aMahanna Aug 4, 2022
d7e2e19
update release.yml
aMahanna Aug 4, 2022
fa2e4cb
update `explicit_metagraph` docstring
aMahanna Aug 4, 2022
c56196b
Update README.md
aMahanna Aug 4, 2022
751fd28
move: __build_tensor_from_dataframe
aMahanna Aug 4, 2022
abf477b
bump
aMahanna Aug 4, 2022
d627902
Revert "bump"
aMahanna Aug 4, 2022
f47a12e
Update test_adapter.py
aMahanna Aug 4, 2022
c0eb5b4
new: `set[str]` metagraph value type
aMahanna Aug 5, 2022
4637506
fix: flake8
aMahanna Aug 5, 2022
123df0e
Update README.md
aMahanna Aug 5, 2022
e72a396
Update README.md
aMahanna Aug 5, 2022
4dcd032
cleanup: test_adapter
aMahanna Aug 5, 2022
ae47341
cleanup: progress bars
aMahanna Aug 5, 2022
df07def
update: documentation
aMahanna Aug 5, 2022
a292744
new: address comments
aMahanna Aug 5, 2022
7e31443
new: `test_full_cycle_homogeneous_with_preserve_adb_keys`
aMahanna Aug 5, 2022
c9f66a8
fix: black & mypy
aMahanna Aug 5, 2022
867729f
Update README.md
aMahanna Aug 5, 2022
92bf916
Update README.md
aMahanna Aug 5, 2022
2d5c135
new: adbpyg 1.1.0 notebook
aMahanna Aug 5, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,9 @@ jobs:
pip install torch
pip install .[dev]

- name: Remove (old) distribution
run: rm -rf dist

- name: Build distribution
run: conda run -n ${{ matrix.python }} python setup.py sdist bdist_wheel

Expand Down
44 changes: 34 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,9 @@ adbpyg_adapter = ADBPyG_Adapter(db)
```

### PyG to ArangoDB

Note: If the PyG graph contains `_key`, `_v_key`, or `_e_key` properties for any node / edge types, the adapter will assume to persist those values as [ArangoDB document keys](https://www.arangodb.com/docs/stable/data-modeling-naming-conventions-document-keys.html). See the `Full Cycle (ArangoDB -> PyG -> ArangoDB)` section below for an example.

```py
# 1.1: PyG to ArangoDB
adb_g = adbpyg_adapter.pyg_to_arangodb("FakeData", data)
Expand All @@ -93,13 +96,15 @@ def y_tensor_to_2_column_dataframe(pyg_tensor):
metagraph = {
"nodeTypes": {
"v0": {
"x": "features", # 1) you can specify a string value for attribute renaming
"x": "features", # 1) You can specify a string value if you want to rename your PyG data when stored in ArangoDB
"y": y_tensor_to_2_column_dataframe, # 2) you can specify a function for user-defined handling, as long as the function returns a Pandas DataFrame
},
# 3) You can specify set of strings if you want to preserve the same PyG attribute names for the node/edge type
"v1": {"x"} # this is equivalent to {"x": "x"}
},
"edgeTypes": {
("v0", "e0", "v0"): {
# 3) you can specify a list of strings for tensor dissasembly (if you know the number of node/edge features in advance)
# 4) You can specify a list of strings for tensor dissasembly (if you know the number of node/edge features in advance)
"edge_attr": [ "a", "b"]
},
},
Expand All @@ -110,7 +115,7 @@ adb_g = adbpyg_adapter.pyg_to_arangodb("FakeData", data, metagraph, explicit_met

# 1.3: PyG to ArangoDB with the same (optional) metagraph, but with `explicit_metagraph=True`
# With `explicit_metagraph=True`, the node & edge types omitted from the metagraph will NOT be converted to ArangoDB.
# Only 'v0' and ('v0', 'e0', 'v0') will be brought over (i.e 'v1', ('v0', 'e0', 'v1'), ... are ignored)
# Only 'v0', 'v1' and ('v0', 'e0', 'v0') will be brought over (i.e 'v2', ('v0', 'e0', 'v1'), ... are ignored)
adb_g = adbpyg_adapter.pyg_to_arangodb("FakeData", data, metagraph, explicit_metagraph=True)

# 1.4: PyG to ArangoDB with a Custom Controller (more user-defined behavior)
Expand Down Expand Up @@ -155,12 +160,12 @@ pyg_g = adbpyg_adapter.arangodb_collections_to_pyg("FakeData", v_cols={"v0", "v1
# 2.3: ArangoDB to PyG via Metagraph v1 (transfer attributes "as is", meaning they are already formatted to PyG data standards)
metagraph_v1 = {
"vertexCollections": {
# we instruct the adapter to create the "x" and "y" tensor data from the "x" and "y" ArangoDB attributes
"v0": { "x": "x", "y": "y"},
"v1": {"x": "x"},
# Move the "x" & "y" ArangoDB attributes to PyG as "x" & "y" Tensors
"v0": {"x", "y"}, # equivalent to {"x": "x", "y": "y"}
"v1": {"v1_x": "x"}, # store the 'x' feature matrix as 'v1_x' in PyG
},
"edgeCollections": {
"e0": {"edge_attr": "edge_attr"},
"e0": {"edge_attr"},
},
}
pyg_g = adbpyg_adapter.arangodb_to_pyg("FakeData", metagraph_v1)
Expand All @@ -184,9 +189,7 @@ metagraph_v2 = {
},
},
"edgeCollections": {
"Ratings": {
"edge_weight": "Rating"
}
"Ratings": { "edge_weight": "Rating" } # Use the 'Rating' attribute for the PyG 'edge_weight' property
},
}
pyg_g = adbpyg_adapter.arangodb_to_pyg("IMDB", metagraph_v2)
Expand Down Expand Up @@ -219,6 +222,27 @@ metagraph_v3 = {
pyg_g = adbpyg_adapter.arangodb_to_pyg("FakeData", metagraph_v3)
```

### Experimental: `preserve_adb_keys`
```py
# With `preserve_adb_keys=True`, the adapter will preserve the ArangoDB vertex & edge _key values into the (newly created) PyG graph.
# Users can then re-import their PyG graph into ArangoDB using the same _key values
pyg_g = adbpyg_adapter.arangodb_graph_to_pyg("imdb", preserve_adb_keys=True)

# pyg_g["Movies"]["_key"] --> ["1", "2", ..., "1682"]
# pyg_g["Users"]["_key"] --> ["1", "2", ..., "943"]
# pyg_g[("Users", "Ratings", "Movies")]["_key"] --> ["2732620466", ..., "2730643624"]

# Let's add a new PyG User Node by updating the _key property
pyg_g["Users"]["_key"].append("new-user-here-944")

# Note: Prior to the re-import, we must manually set the number of nodes in the PyG graph, since the `arangodb_graph_to_pyg` API creates featureless node data
pyg_g["Movies"].num_nodes = len(pyg_g["Movies"]["_key"]) # 1682
pyg_g["Users"].num_nodes = len(pyg_g["Users"]["_key"]) # 944 (prev. 943)

# Re-import PyG graph into ArangoDB
adbpyg_adapter.pyg_to_arangodb("imdb", pyg_g, on_duplicate="update")
```

## Development & Testing

Prerequisite: `arangorestore`
Expand Down
Loading