-
Notifications
You must be signed in to change notification settings - Fork 6
/
workloads.tex
133 lines (111 loc) · 8.36 KB
/
workloads.tex
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
\chapter{Workloads}
\label{sec:workloads}
\section{Query Annotations}
This section describes how to read the query cards in the following sections.
\subsection{Query Description Format}
\label{subsec:query-description-format}
Queries are described in natural language using a well-defined structure that consists of three sections:
\textit{description}, a concise textual description of the query,
\textit{parameters}, a list of input parameters and their types;
\textit{results}, a list of expected results and their types.
Additionally, queries returning multiple results specify \emph{sorting criteria} and a \emph{limit} (to return top-$k$ results).
We use the following notation:
\begin{itemize}
\item \textbf{Vertex type}: vertice type in the dataset.
One word, possibly constructed by appending multiple words together, starting with an uppercase character and following the camel case notation,
\eg \textsf{TagClass} represents an entity of type ``TagClass''.
\item \textbf{Edge type}: edge type in the dataset.
One word, possibly constructed by appending multiple words together, starting with a lowercase character and following the camel case notation
\eg \mbox{\textsf{workAt}} represents an edge of type ``workAt''.
\item \textbf{Attribute}: attribute of a vertice or an edge in the dataset.
One word, possibly constructed by appending multiple words together, starting with a lowercase character and following the camel case notation,
and prefixed by a ``.'' to dereference the vertice/edge,
\eg \textsf{person.firstName} refers to ``firstName'' attribute on the ``person'' entity,
and \mbox{\textsf{studyAt.classYear}} refers to ``classYear'' attribute on the ``studyAt'' edge.
\item \textbf{Unordered Set}: an unordered collection of distinct elements.
Surrounded by \{ and \} braces, with the element type between them,
\eg \textsf{\{String\}} refers to a set of strings.
\item \textbf{Ordered List}: an ordered collection where duplicate elements are allowed.
Surrounded by [ and ] braces, with the element type between them,
\eg \textsf{[String]} refers to a list of strings.
\item \textbf{Ordered Tuple}: a fixed-length, fixed-order list of elements, where elements at each position of the tuple have predefined, possibly different, types.
Surrounded by < and > braces, with the element types between them in a specific order
\eg \textsf{<String, Boolean>} refers to a 2-tuple containing a string value in the first element and a boolean value in the second,
and \textsf{[<String, Boolean>]} is an ordered list of those 2-tuples.
\end{itemize}
\paragraph{Categorization of results.}
Results are categorized according to their source of origin:
\begin{itemize}
\item \textbf{Raw} (\texttt{R}), if the result attribute is returned with an unmodified value and type.
\item \textbf{Calculated} (\texttt{C}), if the result is calculated from attributes using arithmetic operators, functions, boolean conditions, etc.
\item \textbf{Aggregated} (\texttt{A}), if the result is an aggregated value, \eg a count or a sum of another value. If a result is both calculated and aggregated (\eg \lstinline{count(x) + count(y)} or \lstinline{avg(x + y)}), it is considered an aggregated result.
\item \textbf{Meta} (\texttt{M}), if the result is based on type information, \eg the type of a vertice.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Returned Values}
\label{subsec:returned-values}
Return values are subject to the following rules:
\begin{itemize}
\item Path type. The Path type is a sequence of vertices and edges. The
Path type is returned as a sequence of vertex and edge identifiers
ignoring the multiple edges between the same src and dst vertex.
\item Precision of results. In order to maintain consistency of the
benchmark results, all floating-point results are rounded to 3
decimal places using standard rounding rules (i.e., round half up).
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\subsection{Other Annotations}
\label{subsec:other-annotations}
To express the patterns better, the pattern diagrams are drawn from the perspective
of data rather than the matching pattern in the graph. Here are some annotations to each
query card in this section.
\begin{itemize}
\item Each row in the result cell represents an attribute to be returned.
\item The second column means the data type of returned attribute.
If the type is surrounded by \type{\{\}}, it means that the result is a
set, \eg \type{\{String\}} means a string set is returned.
\item For each row in the result cell, the third column annotates the
category of type of result attribute returned, including \texttt{R} short
for Raw, \texttt{A} short for Aggregated, \texttt{C} short for Calculated,
\texttt{S} short for Structural. Among them, structural type means types
such as \type{Path} while raw type means basic types in contrast.
\item In the pattern of each query, the gray dashed box encapsulates the results
to return. And the black solid arrows represent the multiple edges from src
to dst while the black dashed arrows represent the single edges from src to dst.
\end{itemize}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Truncation on Hub Vertices}
\label{sec:truncation-on-hub-vertices}
The high degree of hub vertex is a common feature not only in financial scenarios but also in other scenarios, which is
an inevitable challenge that systems face. To solve the problem, systems can either improve the performance to satisfy
the computation or just reduce the complexity to meet the latency requirements.
The mechanism is to do truncation on the edges when traversing out from the current vertex, which complies with the
discordance. Truncating less-important edges is a useful and practical mechanism to handle the discordance between the
tight latency requirements and hub vertices in the system, where the degree of hub vertex may reach a million and even
billion scales, especially when traversing the graph. To maintain the consistency of the results, a sort order has
to be specified when truncating. Since in financial graphs, users prefer newer data in business. It is reasonable that
attribute, \emph{timestamp}, in the edges is used as the sort order in truncation. With the sort order, truncation is
namely a deterministic sampling in traversing.
In the following queries, some parameters are added to describe the behavior of truncation reducing the complexity
including the \emph{TRUNCATION\_LIMIT} and \emph{TRUNCATION\_ORDER}. \emph{TRUNCATION\_ORDER} can be
\emph{TIMESTAMP\_ASCENDING, TIMESTAMP\_DESCENDING, AMOUNT\_ASCENDING, AMOUNT\_DESCENDING}. At most time,
\emph{TRUNCATION\_ORDER} is set to \emph{TIMESTAMP\_DESCENDING} by default.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Read Write Query}
\label{sec:read-write-query}
In financial scenarios, risk control is a kind of hot and significant application.
Such applications usually detect a specific pattern in the form of linked data before
new records like transfers are written to systems. Read-write query, which can also
be seen as transaction-wrapped strategies, fits these applications very well since
users do not need to worry about translating the patterns to prevent malicious records.
A read-write query is composed of read queries and write queries in the previous sections.
In most cases, whether to commit the write query depends on the detection result of the
read queries. In the initial version, just 3 read-write queries are presented.