|
| 1 | +# Tutorial: creating extension |
| 2 | + |
| 3 | +In this tutorial we will create extension `ban_sus_query`. It will check that DML queries contain predicates, otherwise will just throw an error. |
| 4 | + |
| 5 | +> Next, in order not to mislead up, I will use term `contrib` for PostgreSQL extension, and for `extension` for PostgreSQL Hacker Helper VS Code extension. |
| 6 | +
|
| 7 | +## Creating initial files |
| 8 | + |
| 9 | +PostgreSQL has infrastructure for contrib building and installation. In short, contribs have a template architecture - most parts are common for all. |
| 10 | + |
| 11 | +So, for faster contrib creation we will use command: `PgSQL: Bootstrap extension`. |
| 12 | + |
| 13 | + |
| 14 | + |
| 15 | +It will prompt us to bootstrap some files - choose only C sources. |
| 16 | + |
| 17 | +After that we will have our contrib files created: |
| 18 | + |
| 19 | + |
| 20 | + |
| 21 | +## Initial code |
| 22 | + |
| 23 | +Query execution pipeline has 3 stages: |
| 24 | + |
| 25 | +1. Parse/Semantic analysis - query string parsing and resolving tables |
| 26 | +2. Plan - query optimization and creating execution plan |
| 27 | +3. Execution - actual query execution |
| 28 | + |
| 29 | +Our logic will be added to the 2 stage, because we must check real execution plan, not Query. |
| 30 | +This is because after multiple transformations query can be changed in multiple ways - predicates can be deleted or added, therefore we may get a completely different query than in the original query string. |
| 31 | + |
| 32 | +To implement that we will create hook on planner - `planner_hook`. Inside we will invoke actual `planner` and check it's output for the existence of predicates. |
| 33 | + |
| 34 | +Starter code is the following: |
| 35 | + |
| 36 | +```c |
| 37 | +#include "postgres.h" |
| 38 | + |
| 39 | +#include "fmgr.h" |
| 40 | +#include "optimizer/planner.h" |
| 41 | + |
| 42 | +#ifdef PG_MODULE_MAGIC |
| 43 | +PG_MODULE_MAGIC; |
| 44 | +#endif |
| 45 | + |
| 46 | +static planner_hook_type prev_planner_hook; |
| 47 | + |
| 48 | +void _PG_init(void); |
| 49 | +void _PG_fini(void); |
| 50 | + |
| 51 | +static bool |
| 52 | +is_sus_query(Plan *plan) |
| 53 | +{ |
| 54 | + /* ... */ |
| 55 | + return false; |
| 56 | +} |
| 57 | + |
| 58 | +static PlannedStmt * |
| 59 | +ban_sus_query_planner_hook(Query *parse, |
| 60 | + const char *query_string, |
| 61 | + int cursorOptions, |
| 62 | + ParamListInfo boundParams) |
| 63 | +{ |
| 64 | + PlannedStmt *stmt; |
| 65 | + |
| 66 | + if (prev_planner_hook) |
| 67 | + stmt = prev_planner_hook(parse, query_string, cursorOptions, boundParams); |
| 68 | + else |
| 69 | + stmt = standard_planner(parse, query_string, cursorOptions, boundParams); |
| 70 | + |
| 71 | + if (is_sus_query(stmt->planTree)) |
| 72 | + ereport(ERROR, |
| 73 | + (errmsg("DML query does not contain predicates"))); |
| 74 | + |
| 75 | + return stmt; |
| 76 | +} |
| 77 | +void |
| 78 | +_PG_init(void) |
| 79 | +{ |
| 80 | + prev_planner_hook = planner_hook; |
| 81 | + planner_hook = ban_sus_query_planner_hook; |
| 82 | +} |
| 83 | + |
| 84 | +void |
| 85 | +_PG_fini(void) |
| 86 | +{ |
| 87 | + planner_hook = prev_planner_hook; |
| 88 | +} |
| 89 | +``` |
| 90 | +
|
| 91 | +Now we are ready to add "business-logic", but before let's understand how such suspicious queries look like. |
| 92 | +
|
| 93 | +## Examine queries |
| 94 | +
|
| 95 | +Suspicious query - is a DELETE/UPDATE query that does not contain predicates. |
| 96 | +
|
| 97 | +One of the benefits that we are checking already planned statements is that all predicates are already optimized in a sense that boolean rules are applied. |
| 98 | +
|
| 99 | +Query plan - is a tree of `Plan` nodes. Each `Plan` contains `lefttree`/`righttree` - left and right children and `qual` - list of predicates to apply at this node. But we must check only UPDATE/DELETE nodes, not each node, - nodes for them is `ModifyTable`. |
| 100 | +
|
| 101 | +Thus our goal is: |
| 102 | +
|
| 103 | +> traverse query tree, find `ModifyTable` and check that it's `qual` is not empty |
| 104 | +
|
| 105 | +But, before run sample queries to look what their queries looks like (inside) and which predicates they have. |
| 106 | +
|
| 107 | +For tests we will use this setup: |
| 108 | +
|
| 109 | +```sql |
| 110 | +-- Schema |
| 111 | +CREATE TABLE tbl(x int); |
| 112 | +
|
| 113 | +-- Test queries |
| 114 | +DELETE FROM tbl; |
| 115 | +DELETE FROM tbl WHERE x = 0; |
| 116 | +
|
| 117 | +UPDATE tbl SET x = 1; |
| 118 | +UPDATE tbl SET x = 1 WHERE x = 0; |
| 119 | +``` |
| 120 | + |
| 121 | +To do this we will use our contrib - install it using `make install`, add to `shared_preload_libraries='ban_sus_query'` and put a breakpoint to `return` in `ban_sus_query_planner_hook` function. |
| 122 | + |
| 123 | +When we run first DELETE query without predicate, we will see the following: |
| 124 | + |
| 125 | +1. `PlannedStmt` contains top-level `ModifyTable` with empty `qual` list |
| 126 | + |
| 127 | +  |
| 128 | + |
| 129 | +2. Inner `SeqScan` also contains empty `qual` list |
| 130 | + |
| 131 | +  |
| 132 | + |
| 133 | +Now run `DELETE` query with predicate: |
| 134 | + |
| 135 | +1. `PlannedStmt` still contains empty `qual` list |
| 136 | + |
| 137 | +  |
| 138 | + |
| 139 | +2. Inner `SeqScan` now contains `qual` with single element - equality predicate |
| 140 | + |
| 141 | +  |
| 142 | + |
| 143 | +This is no surprise, because our `ModifyTable` does not apply any filtering - it just takes tuples from children (note, that by convention single-child nodes store them in `lefttree`), so it's `qual` is empty, but filtering is applied to `SeqScan` - we must check this. |
| 144 | + |
| 145 | +> As you can mention, extension shows all Node variables with actual types, without showing generic `Plan` entry. |
| 146 | +> Also extension is able to show you elements of container types (`List *` in this example). |
| 147 | +> More than that, it renders `Expr` nodes (expressions) as it was in a query, so you do not have to manually check each field, trying to figure out what expression it is. |
| 148 | +> |
| 149 | +> In vanilla PostgreSQL you would have to evaluate 2 expressions: (first) get `NodeTag` and (second) cast variable to obtained `NodeTag`. |
| 150 | +> In this example, to show `stmt->planTree` all you need to do is expand the tree node in variables explorer, but manually (without extension), you need to evaluate (i.e. in `watch`) 2 expressions/steps: |
| 151 | +> |
| 152 | +> 1. `((Node *)planTree)->type` get `T_ModifyTable` - tag of `ModifyTable` node, and then |
| 153 | +> 2. `(ModifyTable *)planTree` - show variable with real type. |
| 154 | +> |
| 155 | +>  |
| 156 | +> |
| 157 | +> Such manipulations take roughly 5 second, but, as this time accumulates, totally it can take up to 1 hour in a day - just to show variable's contents! |
| 158 | +> |
| 159 | +> But there is not such support for `Expr` variables - you will not see their representation. |
| 160 | +> For this you have to dump variable to log using `pprint` function, which is not very convenient when you developing in IDE. |
| 161 | +
|
| 162 | +Now we are ready to write some code. |
| 163 | + |
| 164 | +## `is_sus_query` implementation |
| 165 | + |
| 166 | +I repeat, our goal is to *traverse query tree, find `ModifyTable` and check that it's `qual` is not empty*, but now we can refine it: |
| 167 | + |
| 168 | +> Search for `ModifyTable` in `Plan` tree and check that it's children have non-empty `qual` list |
| 169 | +
|
| 170 | +As tree traversal is a recursive function, we will use 2 recursive functions: |
| 171 | + |
| 172 | +- `is_sus_query` - main function that traverses plan tree to find `ModifyTable` node, and when it finds one invokes... |
| 173 | +- `contains_predicates` - function that checks that this `Plan` node contains any predicate in a query |
| 174 | + |
| 175 | +Let's start with `is_sus_query`. All we have to do here is to check that `Plan` is a `ModifyTable` and if so, then check that it's children contain predicates. |
| 176 | + |
| 177 | +Node type checking is a frequent operation, so extension ships with some snippets - one of them is a `isaif`, which expands to `if(IsA())` check: |
| 178 | + |
| 179 | + |
| 180 | + |
| 181 | +When we have determined, that it is a DML operation check that it is DELETE or UPDATE, because `ModifyTable` is used for other operations, i.e. `INSERT`. This is not hard - just check `operation` member. |
| 182 | + |
| 183 | +```c |
| 184 | +static bool |
| 185 | +is_sus_query(Plan *plan) |
| 186 | +{ |
| 187 | + /* ... */ |
| 188 | + ModifyTable *modify = (ModifyTable *)plan; |
| 189 | + switch (modify->operation) |
| 190 | + { |
| 191 | + case CMD_UPDATE: |
| 192 | + case CMD_DELETE: |
| 193 | + /* Check predicates */ |
| 194 | + break; |
| 195 | + default: |
| 196 | + break; |
| 197 | + } |
| 198 | + /* ... */ |
| 199 | +} |
| 200 | + |
| 201 | +``` |
| 202 | +
|
| 203 | +And now check these operations contain predicates using `contains_predicates` function (will be defined further). |
| 204 | +
|
| 205 | +Also, do not forget to handle recursion: call `is_sus_query` for children and handle end case (`NULL`). |
| 206 | +
|
| 207 | +The result function looks like this: |
| 208 | +
|
| 209 | +```c |
| 210 | +static bool |
| 211 | +is_sus_query(Plan *plan) |
| 212 | +{ |
| 213 | + /* Recursion end */ |
| 214 | + if (plan == NULL) |
| 215 | + return false; |
| 216 | +
|
| 217 | + if (IsA(plan, ModifyTable)) |
| 218 | + { |
| 219 | + ModifyTable *modify = (ModifyTable *)plan; |
| 220 | + switch (modify->operation) |
| 221 | + { |
| 222 | + case CMD_UPDATE: |
| 223 | + case CMD_DELETE: |
| 224 | + return !contains_predicates(modify->plan.lefttree); |
| 225 | + default: |
| 226 | + break; |
| 227 | + } |
| 228 | + } |
| 229 | + |
| 230 | + /* Handle recursion */ |
| 231 | + return is_sus_query(plan->lefttree) || is_sus_query(plan->righttree); |
| 232 | +} |
| 233 | +``` |
| 234 | + |
| 235 | +## `contains_predicates` implementation |
| 236 | + |
| 237 | +Now perform actual checking of the predicates existence using `contains_predicates`. Inside this function we must check that given `Plan` contains predicates. |
| 238 | + |
| 239 | +But situation is complicated by the fact that only base `Plan` is given and we do not know actual query. For example this query: |
| 240 | + |
| 241 | +```sql |
| 242 | +DELETE FROM t1 using t2 where t1.x = t2.x; |
| 243 | +``` |
| 244 | + |
| 245 | +Will contain JOIN in `lefttree` of `ModifyTable`: |
| 246 | + |
| 247 | + |
| 248 | + |
| 249 | +Thus we have to clarify what does `contains_predicates` must check. In order not to complicate things a lot, we will just find first node with any predicate. |
| 250 | + |
| 251 | +```c |
| 252 | +static bool |
| 253 | +contains_predicates(Plan *plan) |
| 254 | +{ |
| 255 | + if (plan == NULL) |
| 256 | + return false; |
| 257 | + |
| 258 | + if (plan->qual != NIL) |
| 259 | + return true; |
| 260 | + |
| 261 | + return contains_predicates(plan->lefttree) || contains_predicates(plan->righttree); |
| 262 | +} |
| 263 | +``` |
| 264 | +
|
| 265 | +## Testing |
| 266 | +
|
| 267 | +First things first - test on example queries we defined above: |
| 268 | +
|
| 269 | +```sql |
| 270 | +postgres=# delete from tbl; |
| 271 | +ERROR: DML query does not contain predicates |
| 272 | +postgres=# delete from t1 where x = 0; |
| 273 | +DELETE 0 |
| 274 | +
|
| 275 | +postgres=# update tbl set x = 0; |
| 276 | +ERROR: DML query does not contain predicates |
| 277 | +postgres=# update tbl set x = 0 where x = 0; |
| 278 | +UPDATE 0 |
| 279 | +``` |
| 280 | + |
| 281 | +It's working as expected. |
| 282 | + |
| 283 | +Also, as we injected our contrib as the last step, we can handle more complicated cases, like: |
| 284 | + |
| 285 | +```sql |
| 286 | +postgres=# delete from t1 where true; |
| 287 | +ERROR: DML query does not contain predicates |
| 288 | +``` |
| 289 | + |
| 290 | +## Further improvements |
| 291 | + |
| 292 | +This is just the beginning of the contrib, because there are lot's of corner cases that are not handled. |
| 293 | + |
| 294 | +For example, if we change `true` to `false` in last query, then we still will get an `ERROR`. |
| 295 | +That is because the database has realized that subquery will not return anything, so replaced with "dummy" Plan - `Result` node with `FALSE` one-time check, so nothing will be returned: |
| 296 | + |
| 297 | + |
| 298 | + |
| 299 | +## Result |
| 300 | + |
| 301 | +So far we have seen how you can quickly create new contrib using single command that will create all necessary files. |
| 302 | + |
| 303 | +To write some templated code, we used `isaif` snippet to quickly add check for Node type. |
| 304 | + |
| 305 | +Also, we have traversed query plan tree and saw it's nodes, without requirement to obtain `NodeTag` and cast to given type, which incredibly boosts performance. |
| 306 | + |
| 307 | +And like the icing on the cake we saw expression representations of predicates. For our purposes this is not a very big deal, because query contained only 1 predicate, but in large queries with dozens of different predicates it's just a lifesaver. |
0 commit comments