Skip to content

Latest commit

 

History

History
184 lines (160 loc) · 4.47 KB

README.md

File metadata and controls

184 lines (160 loc) · 4.47 KB

PDQdb

A read-optimized, in-memory, columnar store database (column-oriented DBMS). It's pretty Damn Quick.

Instructions

  1. Install goLang on your computer
  2. $ git clone https://github.com/adam-hanna/PDQdb.git
  3. $ make
  • Only builds for Linux and MacOS, for now
  1. $ PDQdb -f "path/to/your/file.csv" -c "path/to/your/config.json"
  2. http api (returns json)
  • Grab data by key: curl -v -XGET http://127.0.0.1:38216/key/{your key}
  • Count keys: curl -v -XGET http://127.0.0.1:38216/count
  • Grab data by query: see the section on weeQL

weeQL

weeQL is the query language written for PDQdb. It was inspired by SQL and implements many of the features of the SQL language; however, it is a very lightweight implementation and does not include all the functionality of SQL. Pronounced "wee" + "quill". Get it? Wee SQL? Tiny SQL?

All weeQL queries are made by sending a json query object using the http POST method to http://127.0.0.1:38216/query

The json query object is structured as follows:

{
  "SELECT":  [ "COL1", "COL2", ... ], // SITUATIONAL
  "COUNT": "*", // SITUATIONAL
  "WHERE":   { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }, // see the section on where
  "GROUP BY": "COL1" // OPTIONAL
}

The "WHERE" property supports multiple logical operators:

  • `"$OR": [ { "FIELD1": "VAL1" }, {"FIELD2": "VAL2" }, ... ]`
  • `"$NOT": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`
  • `"$NOR": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`; This is the same as `"$NOT": { "$OR": [ ... ] }`
  • `"FIELD1": "$IN": [ "VAL1", "VAL2", ... ]`
  • `"FIELD1": "$NIN": [ "VAL1", "VAL2", ... ]` This is the same as `"$NOT": { "$IN": [ ... ] }`

Exporting and aggregating is explained in more detail, below.

1. Exporting Data

Exporting data is done with the `"SELECT"` query parameter (omit `"COUNT"` and `"GROUP BY"`)
Properties
  • `"SELECT" : [ “FIELD 1”, “FIELD 2”, … ]`: SITUATIONAL. An array of strings that indicates the columns to be returned. Omitted if using `"COUNT"`!
  • `"WHERE": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`: a subdocument of filters. Multiple filters are returned as the intersection of data that meet each criteria (i.e. "FIELD1" = "VAL1" AND "FIELD2" = "VAL2"). Other logical operators besides AND are available. See the section on `"WHERE"` for more info.
example
query: ``` { SELECT : [ “ID” ], WHERE: { “A”: “foo” } } ```

yields:

[
  {
    "ID": "1"
  },
  {
    "ID": "3"
  },
  {
    "ID": "4"
  }
]

2. Aggregation

Perform counts on data with query parameters and an optional `"GROUP BY"` command (omit `"SELECT"`)
Properties
  • `"COUNT": "*"`: SITUATIONAL. The only value currently supported is "*". Omitted if using `"SELECT"`!
  • `"WHERE": { "FIELD1": "VAL1", "FIELD2": "VAL2", ... }`: a subdocument of filters. Multiple filters are returned as the intersection of data that meet each criteria (i.e. "FIELD1" = "VAL1" AND "FIELD2" = "VAL2"). Other logical operators besides AND are available. See the section on `"WHERE"` for more info.
  • `"GROUP BY": "COL1"`: OPTIONAL. Only valid with `"COUNT"` queries. This is the string column name by which to group count results.
examples
query #1:
{
  “COUNT”: “*”,
  "WHERE": {
    “A”: “foo”
  }
}

yields:

{ "COUNT": 3 }

query #2 with "GROUP BY"

{
  “COUNT”: “*”,
  "WHERE": {
      “A”: “foo”
  },
  “GROUP BY”: “B”
}

yields:

{
  "bar": 0,
  "barbar": 0,
  "barbarbar": 0,
  "test": 1,
  "testtest": 1,
  "testtesttest": 1
}

query #3 with "GROUP BY" and "$OR":

{
  “COUNT”: “*”,
  "WHERE": {
    "A": "foo",
    "$OR": [
      { "B": "testtesttest" },
      { "B": "testtest" }
    ]
  },
  “GROUP BY”: “B”
}

yields:

{
  "bar": 0,
  "barbar": 0,
  "barbarbar": 0,
  "test": 0,
  "testtest": 1,
  "testtesttest": 1
}

Example data

###example.csv

ID,A,B
1,foo,test
2,foobar,testtest
3,foo,testtesttest
4,foo,testtest
5,foobarbar,bar
6,foofoo,barbar
7,foofoo,barbarbar

###example_config.json

{
  "header": [
    {"ID": "string"},
    {"A": "string"},
    {"B": "string"}
  ],
  "id_field": "ID",
  "index_fields": ["A", "B"],
  "start_at_line": 2
}