This repo contains a tool to process and summarize the query logs that graph-node generates. It can be used to
- transform the textual log files that
graph-nodepoduces into JSONL files with one JSON object for each query. The input can either be the raw log fromgraph-nodeor the log entries that Google Cloud's Stackdriver emits. The JSONL files can then be used as input to agora - summarize query performance for similar queries to calculate basic statistics like average and maximum query duration
- sample query logs to generate a random sample of a larger logfile
You will need to have a Rust toolchain installed
Check out this repository, cd into the checkout and run
cargo install --bins --path .
The resulting binary will be placed into ~/.cargo/bin. If you want to
update an existing installation, you will also have to pass --force to
cargo.
When you run graph-node, set GRAPH_LOG_QUERY_TIMING to gql. That will
make graph-node log lines of the form
Sep 22 17:01:52.521 INFO Query timing (GraphQL), block: 1234567, query_time_ms: 489, variables: null, query: query things { things(first: 10) { id name } } , query_id: f3e751f1852e62e6-cf1efbcd35771aeb, subgraph_id: Qmsubgraph, component: GraphQlRunner
qlog can then be used to transform and summarize these logs in a number
of ways
In its simplest form, qlog process reads a textfile containing log
entries and transforms them into line-separated JSON (JSONL). To do that,
run
grep 'Query timing' queries.log | qlog process --text --output queries.jsonl
In addition, qlog can also summarize queries by grouping similar queries
by their 'shape hash' by passing --graphql summary.jsonl, for example to
just summarize queries, run
grep 'Query timing' queries.log | qlog process --text --graphql summary.jsonl
If there is already a file with queries in JSONL form, qlog process can
summarize this with the following command. Note that we do not pass
--text to qlog process since the input file is already in JSONL form:
cat queries.jsonl | qlog process --graphql summary.jsonl
Finally, qlog process can also be used to take fixed-size samples of a
logfile; running qlog process like this will produce a file
samples.jsonl that contains 1,000 samples per subgraph taken
uniformly from queries.jsonl:
cat queries.jsonl | qlog process --samples 1000 --sample-file samples.jsonl
Sampling can also be restricted to certain subgraphs with the
--sample-subgraphs option which expects a comma-separated list of
subgraph identifiers, i.e. identifiers in the form Qmsubgraph.
The options for converting a logfile to JSONL, to summarize a logfile, and
to sample a logfile can be combined so that qlog process only needs to be
run over a logfile once, which saves a significant amount of time for large
logfiles:
grep 'Query timing' queries.log | \
qlog process --text --graphql summary.jsonl \
--output queries.jsonl \
--samples 1000 --sample-file samples.jsonl
Google Cloud's logging infrastructure, Stackdriver, wraps log messages in JSON objects. It is possible to have these JSON objects delivered into files in Google Cloud storage by defining a sink in Google Cloud Logging.
The command qlog extract can be used to parse such files and turn them
into JSONL log files, similar to what qlog process with the --output
option produces. Run
cat stackdriver.log | qlog extract --graphql queries.jsonl -
to turn a StackDriver logfile into a JSONL logfile that qlog process can
summarize and sample.
The command qlog combine can be used to combine multiple summary files
into one by running qlog combine summary1.jsonl summary2.jsonl ... > summary.jsonl
With a summary file summary.jsonl produced by qlog process in hand, it
is possible to perform simple analysis tasks with qlog stats and qlog query. For example, the command
qlog stats -s total -f summary.jsonl
will list query details sorted by their total time.
qlog stats can also provide an overview of queries in tabular form, for
example by running qlog stats summary.json | head -n 10:
| QID | calls | total | avg | stddev | max | slow |
|---------+----------+--------------+--------+--------+--------+--------|
| Q000019 | 933741 | 430758440 | 461 | 284 | 7998 | 30440 |
| Q000511 | 3153187 | 412610852 | 131 | 137 | 3991 | 13506 |
| Q000534 | 360948 | 152649245 | 423 | 282 | 10689 | 9162 |
| Q000343 | 1291109 | 35364900 | 27 | 40 | 2918 | 19 |
| Q000829 | 4269 | 33130650 | 7761 | 1116 | 17533 | 4269 |
| Q000527 | 785074 | 32840740 | 42 | 45 | 1974 | 27 |
| Q001673 | 24366 | 27497870 | 1129 | 492 | 7168 | 10180 |
| Q001453 | 106500 | 24405334 | 229 | 361 | 6884 | 2827 |The query subcommand can be used to print more details about a specific
query (the leading 0 in the QNNN identifier are optional):
host:qlog>qlog query summary.json Q558 Q333
================================ Q558 ================================
# subgraph: QmaTK1m8VszFp7iijbWrX65iN8e5zogvJYvUAck7HEvAtQ
# calls: 1
# slow_count: 1
# slow_percent: 100.00 %
# total_time: 11.5 m
# avg_time: 692850 ms
# stddev_time: 0 ms
# max_time: 692850 ms
# max_uuid: 160e5f6b-f230-428a-bd3b-8cae6cbb5d89
query getTokens($pageSize: Int = 50) { tokens(first: $pageSize) { address name symbol decimals events { ... on TransferEvent { amount sender destination } } } }
================================ Q333 ================================
# subgraph: QmVEoWSQ8eNnVkL1uw2toDSc1xXKi1ADYUm36p961Q1EzT
# calls: 692
# slow_count: 10
# slow_percent: 1.45 %
# total_time: 18.7 m
# avg_time: 1625 ms
# stddev_time: 14798 ms
# max_time: 158087 ms
# max_uuid: e956ea45-807d-43bb-ad4b-4adcd04bc281
query getSubdomains($id: ID!) { domain(id: $id) { id labelName subdomains { id labelName labelhash name owner { id __typename } __typename } __typename } }If we have a directory with summary files YYYY-MM-DD.jsonl for each day,
we can list the total number of queries per day using jq:
for f in *.jsonl
do
day=$(basename $f .jsonl)
gql=$(jq -s 'map(.calls) | add ' < $f)
printf "%s %10s\n" "$day" "$gql"
doneThis saves the number of queries per subgraph per day to a csv:
summaries=""
for f in *.jsonl
do
day=$(basename $f .json)
day_summary=$(jq -s 'group_by(.subgraph) |
map({"subgraph":(.[0].subgraph), "calls":(reduce .[].calls as $calls (0; . + $calls)), "date":("'$day'") }) |
map([.subgraph, .calls, .date] | join(", ")) | join(" \r\n ")' < $f)
summaries+="${day_summary//\"} \r\n"
done
printf "${summaries//\"}" > subgraph_daily_summary.csvThe JSONL files that qlog process --graphql produces contain a list of
JSON objects, one for each query that graph-node responded to with the
following entries:
subgraph: the IPFS hash of the subgraph against which the query was runquery_id: the entry is in the form<shape hash>-<query hash>where the shape hash is the hash of the query text when disregarding concrete filter values in the query, so that a query withfirst: 10and one withfirst: 100produce the same shape hash. THequery hashis the hash of the query text and the variables passed to the query.block: the number of the block against which the query was executedtime: how long processing the query took in msquery: the text of the GraphQL queryvariables: the variables used in the query as a string that is a JSON object in its own righttimestamp: the server time when the query was run
The summary JSONL files produced with qlog process --summary contain JSON
objects with the following entries:
query: an example of the query being summarized. Queries are summarized by their shape hash, i.e. the summary file contains one entry for each unique shape in the original log filesubgraph: the IPFS hash of the subgraphcalls: the number of times this query shape was executedslow_count: the number of executions of this query shape that took more than 1stotal_time: the sum of the execution time of all queries being summarized in mstime_squared: the sum of the square of the execution time of queries in ms^2max_time: the time that the slowest execution of this query shape took in msmax_uuid: thequery_idof a query that tookmax_time(it's calleduuidfor historical reasons)max_variables: the variables that were passed to the query when it tookmax_timemax_complexity: meaningless; only there for historical reasonsid,hash: used byqlogfor internal bookkeeping
Average query execution time can be calculated from this data as
total_time / calls and the standard deviation as sqrt(time_squared / calls - (total_time/calls)^2)
For more complex analysis, it is best to load summaries into a dedicated
Postgres database. In the following, we assume that the environment is set
up in such a way that running psql will connect to that dedicated
database. The command ./bin/load -c will create a database schema qlog
for that purpose.
Assuming there is a directory daily-summaries with one summary file
called YYYY-MM-DD.jsonl for each day, simply run
./bin/load daily-summaries/*.jsonl
Subsequent days can be added to the database with ./bin/load daily-summaries/2020-11-01.jsonl
Once the summary data is loaded into Postgres, we can run even more complex queries, for example, to find the subgraphs causing the most work:
host:qlog> psql -c '
> select subgraph,
> sum(total_time::numeric)/(24*60*60*1000) as total_time_days,
> sum(calls) as calls
> from qlog.data
> group by subgraph
> order by 2 desc
> limit 5;'Copyright © 2021 The Graph Foundation
Licensed under the MIT license.