forked from NCI-GDC/gdcdictionary
-
Notifications
You must be signed in to change notification settings - Fork 10
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
aff0e6b
commit 990e6df
Showing
1 changed file
with
3 additions
and
3 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,9 @@ | ||
# Design Notes | ||
|
||
One important aspect worth mentioning is that it is purposely chosen to model the dictionary using Directed Acyclic Graph (DAG). The idea behinds it is simpilicity! From a practical point of view, the data dictionary is not meant to model every aspects of the real world GDC entities and their relations with fine grain semantics. Rather, it's important to be able to express and enforce data integrity rules. So far, DAG seems to be a good fit although we should look harder to see whether there are any show-stoppers, are there any relations/rules can not be expressed using DAG and there is no way around it? | ||
One important aspect worth mentioning is that it is purposely chosen to model the dictionary using Directed Acyclic Graph (DAG). The idea behinds it is simplicity! From a practical point of view, the data dictionary is not meant to model every aspects of the real world GDC entities and their relations with fine grain semantics. Rather, it's important to be able to express and enforce data integrity rules. So far, DAG seems to be a good fit although we should look harder to see whether there are any show-stoppers, are there any relations/rules can not be expressed using DAG and there is no way around it? | ||
|
||
Choosing simplier design will always bring benefits in software development at all levels and phases. In our case, comparing to a general graph, DAG will be much easier to work with. Querying a DAG will be simply searching up to find parents or searching down to find children; finding siblings will require one step up and one step down, which should be performant. As any other graph search, querying a DAG with very wide or deep structure can be expensive. There should be tricks we can play to make the queries we care about performant. We can also keep it in mind while designing the model to avoid very wide or deep DAG whenever possible. | ||
Choosing simpler design will always bring benefits in software development at all levels and phases. In our case, comparing to a general graph, DAG will be much easier to work with. Querying a DAG will be simply searching up to find parents or searching down to find children; finding siblings will require one step up and one step down, which should be performant. As any other graph search, querying a DAG with very wide or deep structure can be expensive. There should be tricks we can play to make the queries we care about performant. We can also keep it in mind while designing the model to avoid very wide or deep DAG whenever possible. | ||
|
||
Taking one step further, practically, type of relations between nodes may not be so important to us. Just like ER modeling in RDBMS, uniqueness/cadinality is the only thing matters. With this thinking, it's possible to entirely eliminate the need for an edge table while implement the DAG model. We will still need to support some not so straightforward relations such as conditional relations, such as when there is A, there should (or shouldn't) be B etc. However such business logic is not necessarily harder for DAG to handle. | ||
Taking one step further, practically, types of relations between nodes may not be so important to us. Just like ER modeling in RDBMS, uniqueness/cardinality is the only thing matters. With this thinking, it's possible to entirely eliminate the need for an edge table while implementing the DAG model. We will still need to support some not so straightforward relations such as conditional relations, such as when there is A, there should (or shouldn't) be B etc. However such business logic is not necessarily harder for DAG to handle. | ||
|
||
One last point is that converting data in a DAG to JSON should be easier comparing to a general graph. Data in a JSON document is essentially a tree. When converting a DAG to a tree, it is mainly to denormalize child nodes with multiple parents into multiple copies, each parent will have a materialized local child node copy. This should make the logic cleaner when we export data in graph db to JSON to build Elasticsearch indexes. |