Skip to content

Commit

Permalink
Merge pull request #49 from gleanerio/df-dev
Browse files Browse the repository at this point in the history
@valentinedwv Thanks for all the great feedback and code review!
  • Loading branch information
fils authored Sep 1, 2023
2 parents 749f4fe + 9d9bcf0 commit 6a4f036
Show file tree
Hide file tree
Showing 45 changed files with 85,908 additions and 513 deletions.
2 changes: 1 addition & 1 deletion VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
2.0.8-development
2.0.18-df-development
50 changes: 42 additions & 8 deletions config/example.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,9 @@ minio:
accesskey: akey
secretkey: skey
bucket: gleaner
region: null
implementation_network:
orgname: iow
context:
cache: true
strict: true
Expand All @@ -19,11 +22,42 @@ objects:
- summoned/providera
- prov/providera
- org
sparql:
endpoint: http://localhost/blazegraph/namespace/earthcube/sparql
endpointBulk: http://coreos.lan:3030/testing/data
endpointMethod: POST
contentType: application/n-quads
authenticate: false
username: ""
password: ""
endpoints:
- service: ec_blazegraph
baseurl: http://coreos.lan:9090/blazegraph/namespace/iow
type: blaszgraph
authenticate: false
username: admin
password: jfpwd
modes:
- action: sparql
suffix: /sparql
accept: application/sparql-results+json
method: GET
- action: update
suffix: /sparql
accept: application/sparql-update
method: POST
- action: bulk
suffix: /sparql
accept: text/x-nquads
method: POST
- service: iow_graphdb
baseurl: http://coreos.lan:7200/repositories/testing
type: graphed
authenticate: false
username: admin
password: jfpw
modes:
- action: sparql
suffix: # no suffix needed for GraphDB
accept: application/sparql-results+json
method: GET
- action: update
suffix: /statements
accept: application/sparql-update
method: POST
- action: bulk
suffix: /statements
accept: text/x-nquads
method: POST
52 changes: 27 additions & 25 deletions decisions/0001-URN-decision.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,45 +8,47 @@ Proposed

## Context

URNs for the graph URI are set in the file internal/graph/mintURN.go

current
```
urn:{bucket}:{provider}:{sha}
```

proposed
```
urn:gleanerio:{network}:{provider}:{sha}
```

As JSON-LD documents, representing data graphs, are collected from sources they
need to be processed in the graph. When doing this we generate a named graph URN
to identify the set of triples coming from a given document.

## Decision

Old URNs were varationas on
****The desired URN pattern would then look like the following.
These would likely always be pulled from the summoned prefix, and as such be JSON-LD.

```rdf
urn:gleaner.io:summoned:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
urn:{engine}:{implnet}:{source}:{type}:{sha}
```

or
* engine: In our case always _gleaner.io_ to represent the code base used. Other
groups may wish to use other packages like apache systems and can denote that here.
The value is small, but it does give some evidence to the tools used which my have impact.
* implnet: The implementing network or organization doing the activity. This should be
one word, lower case and all alphanumeric. So things like: oih, decoder, geocodes, iow, polder, etc.
* source: The name of the source from the Gleaner configuration file. It should also be
one word, lower case and all alphanumeric. So things like: bcodmo, aquadocs, iris, etc.
* type: One of;
* data: representing the data graphs collected
* prov: representing the prov graphs describing the collection process
* org: representing on the organization data graphs generated by Gleaner for a source
* sha: The sha hash generated.

Populated examples might look like:

```rdf
urn:gleaner.io:milled:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
urn:gleaner.io:oih:edmo:prov:0255293683036aac2a95a2479cc841189c0ac3f8
or
urn:gleaner.io:iow:counties0:data:00010f9f071c39fcc0ca73eccad7470b675cd8a3
```

The milled and summoned elements were pointless and led to confusion and were not
really important in terms of getting to the object.

The new desired URN pattern is

```rdf
urn:gleaner.io:edmo:0255293683036aac2a95a2479cc841189c0ac3f8
```

## Consequences

This impacts gleaner in the generation of prov which will need to use this same pattern
to fill out the prov records.


Also, this means the URN does not actually represent the location of the object. Rather the
client must know to go looking in summoned and or milled. As noted, the use of milled is
not really compelling. That aside, it is a case where the URN is now just an identifier and
does not represent a resolvable object.
36 changes: 36 additions & 0 deletions decisions/0002-objectPrefixes-decision.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
# 1. Record architecture decisions

Date: 08-23-2023

## Status

Proposed

## Context

There are some conventions used in the levering of an object store by GleanerIO.
This ADR scopes the naming conventions used both by Gleaner and Nabu.

Some of these conventions have implications on the behavior of the code. For
example, the URN generation leverages the path structure to establish the
urn structure (see 0001-URN-decision.md).

Though the resulting URN is abstracted from the object prefix value, that prefix
is still used in the initial formation.

* graphs/
* graphs/archive
* graphs/latest
* graphs/summary
* summoned/
* prov/
* milled/
* orgs/
* reports/
* scheduler/

## Decision


## Consequences

8 changes: 8 additions & 0 deletions docs/httpSPARQL.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,3 +16,11 @@ curl -X POST -H 'Content-Type:application/n-quads' --data-binary @May4Buildings
```bash
curl -X POST -H 'Content-Type:text/x-nquads' --data-binary @May4Buildings.nq http://192.168.86.45:32772/blazegraph/namespace/loadtest/sparql
```

```bash
curl -H 'Accept: application/sparql-results+json' http://coreos.lan:9090/blazegraph/namespace/iow/sparql --data-urlencode 'query=select * where{ ?s ?p ?o } limit 10'
```

```bash
curl -H 'Accept: application/sparql-results+json' http://coreos.lan:9090/blazegraph/namespace/iow/sparql --data-urlencode 'query=SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)(COUNT(*) AS ?triplesCount)WHERE {GRAPH ?graph {?subject ?predicate ?object}}'
```
77 changes: 77 additions & 0 deletions docs/images/workflow.d2
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
direction: right


gi: Get Image(s) {
style.fill: "#e0a3ff"
width: 200
height: 150
}

g: Gleaner Harvest {
style.fill: honeydew
width: 200
height: 150
}

data: Data Graph {
dr: Build Release Graph {
style.fill: "#f4a261"
width: 300
}
udr: Load Release Graph {
style.fill: "#f4a261"
width: 300
}

dp: Prune {
style.fill: "#f4a261"
width: 300
}
}

org: Organization Graph {
or: Prefix Load Graph {
style.fill: "#f4a261"
width: 300
}

# uor: Load Release Graph {
# style.fill: "#f4a261"
# width: 300
# }

# op: Prune {
# style.fill: "#f4a261"
# width: 300
# }

# org.or -> org.uor -> org.op

}

prov: Provenance Graph {
pr: Build Release Graph {
style.fill: "#f4a261"
width: 300
}
dpg: Clear Current Graph{
style.fill: "#f4a261"
width: 300
}
upr: Load Release Graph {
style.fill: "#f4a261"
width: 300
}
dpp: Delete Generated Data Graphs {
style.fill: "#f4a261"
width: 300
}
}

gi -> g
g -> org.or
g -> data.dr
g -> prov.pr

data.dr -> data.udr -> data.dp
prov.pr -> prov.dpg -> prov.upr -> prov.dpp
102 changes: 102 additions & 0 deletions docs/images/workflow.svg
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
15 changes: 8 additions & 7 deletions docs/sparql/countAllGraphs.rq
Original file line number Diff line number Diff line change
@@ -1,11 +1,12 @@
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT (COUNT( DISTINCT ?g) AS ?graphs)

SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)
(COUNT(*) AS ?triplesCount)
WHERE {
graph ?g {
?s ?p ?o
GRAPH ?graph {
?subject ?predicate ?object
}
}



SELECT (COUNT(DISTINCT ?graph) AS ?namedGraphsCount)(COUNT(*) AS ?triplesCount)WHERE {GRAPH ?graph {?subject ?predicate ?object}}
35 changes: 4 additions & 31 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -3,47 +3,20 @@ module github.com/gleanerio/nabu
go 1.15

require (
github.com/bbalet/stopwords v1.0.0
github.com/blevesearch/bleve v1.0.14 // indirect
github.com/buger/jsonparser v1.1.1 // indirect
github.com/coreos/bbolt v1.3.2 // indirect
github.com/coreos/etcd v3.3.13+incompatible // indirect
github.com/coreos/go-systemd v0.0.0-20190321100706-95778dfbb74e // indirect
github.com/coreos/pkg v0.0.0-20180928190104-399ea9e2e55f // indirect
github.com/coyove/jsonbuilder v0.0.0-20160414062945-90ee6d2c3c43 // indirect
github.com/dgrijalva/jwt-go v3.2.0+incompatible // indirect
github.com/gleanerio/gleaner v0.0.0-20211103190335-f9d8811ee43b // indirect
github.com/go-ini/ini v1.62.0 // indirect
github.com/gosuri/uilive v0.0.4 // indirect
github.com/gosuri/uiprogress v0.0.1
github.com/grpc-ecosystem/go-grpc-middleware v1.0.0 // indirect
github.com/grpc-ecosystem/go-grpc-prometheus v1.2.0 // indirect
github.com/jonboulle/clockwork v0.1.0 // indirect
github.com/kisielk/godepgraph v0.0.0-20190626013829-57a7e4a651a9 // indirect
github.com/google/uuid v1.2.0 // indirect
github.com/knakk/rdf v0.0.0-20190304171630-8521bf4c5042
github.com/meilisearch/meilisearch-go v0.21.1
github.com/minio/minio-go v6.0.14+incompatible
github.com/minio/minio-go/v7 v7.0.15
github.com/neuml/txtai.go v1.0.0
github.com/orandin/lumberjackrus v1.0.1
github.com/paulmach/go.geojson v1.4.0 // indirect
github.com/piprate/json-gold v0.4.0
github.com/prometheus/client_golang v0.9.3 // indirect
github.com/protolambda/gocyto v0.0.1 // indirect
github.com/piprate/json-gold v0.5.0
github.com/pquerna/cachecontrol v0.1.0 // indirect
github.com/rs/xid v1.2.1
github.com/schollz/progressbar v1.0.0
github.com/schollz/progressbar/v3 v3.8.3
github.com/sirupsen/logrus v1.8.1
github.com/soheilhy/cmux v0.1.4 // indirect
github.com/spf13/cobra v1.2.1
github.com/spf13/viper v1.9.0
github.com/tidwall/gjson v1.14.2
github.com/tidwall/sjson v1.2.5
github.com/tmc/grpc-websocket-proxy v0.0.0-20190109142713-0ad062ec5ee5 // indirect
github.com/xiang90/probing v0.0.0-20190116061207-43a291ad63a2 // indirect
golang.org/x/text v0.3.7
gopkg.in/check.v1 v1.0.0-20201130134442-10cb98267c6c // indirect
gopkg.in/natefinch/lumberjack.v2 v2.0.0 // indirect
gopkg.in/redis.v5 v5.2.9 // indirect
gopkg.in/resty.v1 v1.12.0 // indirect
honnef.co/go/tools v0.1.2 // indirect
)
Loading

0 comments on commit 6a4f036

Please sign in to comment.