Skip to content

Commit 751964c

Browse files
committed
docs: create contrib tutorial with extension capabilities examples
1 parent 994f91a commit 751964c

12 files changed

+309
-0
lines changed

docs/create_extension.md

Lines changed: 307 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,307 @@
1+
# Tutorial: creating extension
2+
3+
In this tutorial we will create extension `ban_sus_query`. It will check that DML queries contain predicates, otherwise will just throw an error.
4+
5+
> Next, in order not to mislead up, I will use term `contrib` for PostgreSQL extension, and for `extension` for PostgreSQL Hacker Helper VS Code extension.
6+
7+
## Creating initial files
8+
9+
PostgreSQL has infrastructure for contrib building and installation. In short, contribs have a template architecture - most parts are common for all.
10+
11+
So, for faster contrib creation we will use command: `PgSQL: Bootstrap extension`.
12+
13+
![Bootstrap extension command](img/create_extension/bootstrap_extension.png)
14+
15+
It will prompt us to bootstrap some files - choose only C sources.
16+
17+
After that we will have our contrib files created:
18+
19+
![README.md with directory contents](img/create_extension/initial_dir_content.png)
20+
21+
## Initial code
22+
23+
Query execution pipeline has 3 stages:
24+
25+
1. Parse/Semantic analysis - query string parsing and resolving tables
26+
2. Plan - query optimization and creating execution plan
27+
3. Execution - actual query execution
28+
29+
Our logic will be added to the 2 stage, because we must check real execution plan, not Query.
30+
This is because after multiple transformations query can be changed in multiple ways - predicates can be deleted or added, therefore we may get a completely different query than in the original query string.
31+
32+
To implement that we will create hook on planner - `planner_hook`. Inside we will invoke actual `planner` and check it's output for the existence of predicates.
33+
34+
Starter code is the following:
35+
36+
```c
37+
#include "postgres.h"
38+
39+
#include "fmgr.h"
40+
#include "optimizer/planner.h"
41+
42+
#ifdef PG_MODULE_MAGIC
43+
PG_MODULE_MAGIC;
44+
#endif
45+
46+
static planner_hook_type prev_planner_hook;
47+
48+
void _PG_init(void);
49+
void _PG_fini(void);
50+
51+
static bool
52+
is_sus_query(Plan *plan)
53+
{
54+
/* ... */
55+
return false;
56+
}
57+
58+
static PlannedStmt *
59+
ban_sus_query_planner_hook(Query *parse,
60+
const char *query_string,
61+
int cursorOptions,
62+
ParamListInfo boundParams)
63+
{
64+
PlannedStmt *stmt;
65+
66+
if (prev_planner_hook)
67+
stmt = prev_planner_hook(parse, query_string, cursorOptions, boundParams);
68+
else
69+
stmt = standard_planner(parse, query_string, cursorOptions, boundParams);
70+
71+
if (is_sus_query(stmt->planTree))
72+
ereport(ERROR,
73+
(errmsg("DML query does not contain predicates")));
74+
75+
return stmt;
76+
}
77+
void
78+
_PG_init(void)
79+
{
80+
prev_planner_hook = planner_hook;
81+
planner_hook = ban_sus_query_planner_hook;
82+
}
83+
84+
void
85+
_PG_fini(void)
86+
{
87+
planner_hook = prev_planner_hook;
88+
}
89+
```
90+
91+
Now we are ready to add "business-logic", but before let's understand how such suspicious queries look like.
92+
93+
## Examine queries
94+
95+
Suspicious query - is a DELETE/UPDATE query that does not contain predicates.
96+
97+
One of the benefits that we are checking already planned statements is that all predicates are already optimized in a sense that boolean rules are applied.
98+
99+
Query plan - is a tree of `Plan` nodes. Each `Plan` contains `lefttree`/`righttree` - left and right children and `qual` - list of predicates to apply at this node. But we must check only UPDATE/DELETE nodes, not each node, - nodes for them is `ModifyTable`.
100+
101+
Thus our goal is:
102+
103+
> traverse query tree, find `ModifyTable` and check that it's `qual` is not empty
104+
105+
But, before run sample queries to look what their queries looks like (inside) and which predicates they have.
106+
107+
For tests we will use this setup:
108+
109+
```sql
110+
-- Schema
111+
CREATE TABLE tbl(x int);
112+
113+
-- Test queries
114+
DELETE FROM tbl;
115+
DELETE FROM tbl WHERE x = 0;
116+
117+
UPDATE tbl SET x = 1;
118+
UPDATE tbl SET x = 1 WHERE x = 0;
119+
```
120+
121+
To do this we will use our contrib - install it using `make install`, add to `shared_preload_libraries='ban_sus_query'` and put a breakpoint to `return` in `ban_sus_query_planner_hook` function.
122+
123+
When we run first DELETE query without predicate, we will see the following:
124+
125+
1. `PlannedStmt` contains top-level `ModifyTable` with empty `qual` list
126+
127+
![qual is empty in ModifyTable node](./img/create_extension/delete_no_pred_modifytable.png)
128+
129+
2. Inner `SeqScan` also contains empty `qual` list
130+
131+
![qual is empty in SeqScan node](./img/create_extension/delete_no_pred_seqscan.png)
132+
133+
Now run `DELETE` query with predicate:
134+
135+
1. `PlannedStmt` still contains empty `qual` list
136+
137+
![qual is still empty in ModifyTable node](./img/create_extension/delete_pred_modifytable.png)
138+
139+
2. Inner `SeqScan` now contains `qual` with single element - equality predicate
140+
141+
![SeqScan has single equality predicate](./img/create_extension/delete_pred_seqscan.png)
142+
143+
This is no surprise, because our `ModifyTable` does not apply any filtering - it just takes tuples from children (note, that by convention single-child nodes store them in `lefttree`), so it's `qual` is empty, but filtering is applied to `SeqScan` - we must check this.
144+
145+
> As you can mention, extension shows all Node variables with actual types, without showing generic `Plan` entry.
146+
> Also extension is able to show you elements of container types (`List *` in this example).
147+
> More than that, it renders `Expr` nodes (expressions) as it was in a query, so you do not have to manually check each field, trying to figure out what expression it is.
148+
>
149+
> In vanilla PostgreSQL you would have to evaluate 2 expressions: (first) get `NodeTag` and (second) cast variable to obtained `NodeTag`.
150+
> In this example, to show `stmt->planTree` all you need to do is expand the tree node in variables explorer, but manually (without extension), you need to evaluate (i.e. in `watch`) 2 expressions/steps:
151+
>
152+
> 1. `((Node *)planTree)->type` get `T_ModifyTable` - tag of `ModifyTable` node, and then
153+
> 2. `(ModifyTable *)planTree` - show variable with real type.
154+
>
155+
> ![Evaluate variable in watch](./img/create_extension/nodetag_watch.png)
156+
>
157+
> Such manipulations take roughly 5 second, but, as this time accumulates, totally it can take up to 1 hour in a day - just to show variable's contents!
158+
>
159+
> But there is not such support for `Expr` variables - you will not see their representation.
160+
> For this you have to dump variable to log using `pprint` function, which is not very convenient when you developing in IDE.
161+
162+
Now we are ready to write some code.
163+
164+
## `is_sus_query` implementation
165+
166+
I repeat, our goal is to *traverse query tree, find `ModifyTable` and check that it's `qual` is not empty*, but now we can refine it:
167+
168+
> Search for `ModifyTable` in `Plan` tree and check that it's children have non-empty `qual` list
169+
170+
As tree traversal is a recursive function, we will use 2 recursive functions:
171+
172+
- `is_sus_query` - main function that traverses plan tree to find `ModifyTable` node, and when it finds one invokes...
173+
- `contains_predicates` - function that checks that this `Plan` node contains any predicate in a query
174+
175+
Let's start with `is_sus_query`. All we have to do here is to check that `Plan` is a `ModifyTable` and if so, then check that it's children contain predicates.
176+
177+
Node type checking is a frequent operation, so extension ships with some snippets - one of them is a `isaif`, which expands to `if(IsA())` check:
178+
179+
![isaif expansion](./img/create_extension/isaif_modifytable.png)
180+
181+
When we have determined, that it is a DML operation check that it is DELETE or UPDATE, because `ModifyTable` is used for other operations, i.e. `INSERT`. This is not hard - just check `operation` member.
182+
183+
```c
184+
static bool
185+
is_sus_query(Plan *plan)
186+
{
187+
/* ... */
188+
ModifyTable *modify = (ModifyTable *)plan;
189+
switch (modify->operation)
190+
{
191+
case CMD_UPDATE:
192+
case CMD_DELETE:
193+
/* Check predicates */
194+
break;
195+
default:
196+
break;
197+
}
198+
/* ... */
199+
}
200+
201+
```
202+
203+
And now check these operations contain predicates using `contains_predicates` function (will be defined further).
204+
205+
Also, do not forget to handle recursion: call `is_sus_query` for children and handle end case (`NULL`).
206+
207+
The result function looks like this:
208+
209+
```c
210+
static bool
211+
is_sus_query(Plan *plan)
212+
{
213+
/* Recursion end */
214+
if (plan == NULL)
215+
return false;
216+
217+
if (IsA(plan, ModifyTable))
218+
{
219+
ModifyTable *modify = (ModifyTable *)plan;
220+
switch (modify->operation)
221+
{
222+
case CMD_UPDATE:
223+
case CMD_DELETE:
224+
return !contains_predicates(modify->plan.lefttree);
225+
default:
226+
break;
227+
}
228+
}
229+
230+
/* Handle recursion */
231+
return is_sus_query(plan->lefttree) || is_sus_query(plan->righttree);
232+
}
233+
```
234+
235+
## `contains_predicates` implementation
236+
237+
Now perform actual checking of the predicates existence using `contains_predicates`. Inside this function we must check that given `Plan` contains predicates.
238+
239+
But situation is complicated by the fact that only base `Plan` is given and we do not know actual query. For example this query:
240+
241+
```sql
242+
DELETE FROM t1 using t2 where t1.x = t2.x;
243+
```
244+
245+
Will contain JOIN in `lefttree` of `ModifyTable`:
246+
247+
![Merge Join as child of ModifyTable](./img/create_extension/mergejoin_child_modifytable.png)
248+
249+
Thus we have to clarify what does `contains_predicates` must check. In order not to complicate things a lot, we will just find first node with any predicate.
250+
251+
```c
252+
static bool
253+
contains_predicates(Plan *plan)
254+
{
255+
if (plan == NULL)
256+
return false;
257+
258+
if (plan->qual != NIL)
259+
return true;
260+
261+
return contains_predicates(plan->lefttree) || contains_predicates(plan->righttree);
262+
}
263+
```
264+
265+
## Testing
266+
267+
First things first - test on example queries we defined above:
268+
269+
```sql
270+
postgres=# delete from tbl;
271+
ERROR: DML query does not contain predicates
272+
postgres=# delete from t1 where x = 0;
273+
DELETE 0
274+
275+
postgres=# update tbl set x = 0;
276+
ERROR: DML query does not contain predicates
277+
postgres=# update tbl set x = 0 where x = 0;
278+
UPDATE 0
279+
```
280+
281+
It's working as expected.
282+
283+
Also, as we injected our contrib as the last step, we can handle more complicated cases, like:
284+
285+
```sql
286+
postgres=# delete from t1 where true;
287+
ERROR: DML query does not contain predicates
288+
```
289+
290+
## Further improvements
291+
292+
This is just the beginning of the contrib, because there are lot's of corner cases that are not handled.
293+
294+
For example, if we change `true` to `false` in last query, then we still will get an `ERROR`.
295+
That is because the database has realized that subquery will not return anything, so replaced with "dummy" Plan - `Result` node with `FALSE` one-time check, so nothing will be returned:
296+
297+
![dummy Result Node](./img/create_extension/dummy_result_node.png)
298+
299+
## Result
300+
301+
So far we have seen how you can quickly create new contrib using single command that will create all necessary files.
302+
303+
To write some templated code, we used `isaif` snippet to quickly add check for Node type.
304+
305+
Also, we have traversed query plan tree and saw it's nodes, without requirement to obtain `NodeTag` and cast to given type, which incredibly boosts performance.
306+
307+
And like the icing on the cake we saw expression representations of predicates. For our purposes this is not a very big deal, because query contained only 1 predicate, but in large queries with dozens of different predicates it's just a lifesaver.
35.2 KB
Loading
147 KB
Loading
151 KB
Loading
147 KB
Loading
142 KB
Loading
86.8 KB
Loading
43.3 KB
Loading
39.2 KB
Loading
60.6 KB
Loading

0 commit comments

Comments
 (0)