-
Notifications
You must be signed in to change notification settings - Fork 9
/
README.txt
340 lines (253 loc) · 15.2 KB
/
README.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
README for the PostgreSQL Index Adviser
1. Introduction
============
The Index Adviser is an extension for PostgreSQL, that allows the RDBMS to
recommend a set of potentially useful indexes for a given query.
Following are the components of the Index Adviser.
i) An extension to query planner. That is, a new backend module, that
interacts with the planner to come up with suggestions. These suggestions are
then stored in the advise_index table for later analysis.
ii) The pg_advise_index tool; it is a separate contrib module now. It takes a
workload (a set of SQL queries) as it's input, and outputs SQL statements that,
if executed, will create the indexes suggested by the Index Adviser.
pg_advise_index also has an option where you can specify the maximum size of
indexes you can afford (disk space restrictions), and it will try to come up
with only those indexes that would fit in that size with the greatest benefit.
iii) A pl/pgsql function, advise_index_show(). This function interprets the
contents of the advise_index table, and outputs SQL statements (similar to
pg_advise_index) that, when executed, will create the indexes suggested by the
Index Adviser.
iv) A few helper SQL queries like advise_index_summary.sql, that help in
interpreting the Index Adviser output stored in the advise_index table.
2. Installation Instructions
=========================
i) Apply the patch to the sources, recompile, and install postgres.
ii) Execute 'make' for this contrib module and the pg_advise_index module.
You can execute 'make install' in the pg_advise_index module; doing that
will install the pg_advise_index binary in the bin/ directory. But,
doing a 'make install' for this (pg_index_adviser) contrib module will
not help, you will have to manually install plugin generated by this
module. Copy the libpg_index_adviser (.dll or .so) file to your
PostgreSQL's $libdir/plugins/ directory.
iii) Create the table advise_index, using the script advise_index.create.sql
provided in this contrib module. Make sure that the user, under which
you wish to generate advisory, has proper permissions on it (INSERT is
the bare minimum, but having UPDATE and DELETE permissions too wouldn't
hurt).
iv) Create the advise_index_show() plpgsql function using the script
advise_index_show.create.sql provided in this contrib module. This
function is not a requisite for proper functioning of the Index
Adviser, but it will help you interpret the advisory generated and
inserted into advise_index table by the Adviser.
Notes
-----
While working with the Adviser, if you get an error like:
ERROR: relation "advise_index" does not exist
CONTEXT: SQL statement "insert into advise_index values( 16395,
array[1], 1782.973755, 1752, 11796, now());"
then it means that the required advise_index table is either not created or
the current user does not have INSERT permissions on it. Please create this
table as mentioned in step (ii) above.
Also, you should make sure that you do not enable the Index Adviser in
read-only transaction, as that will cause the internal INSERT statements to
throw ERRORs similar to above.
3. User Interface
==============
If a user intends the RDBMS to recommend indexes, then she must provide it
with a workload, that is, a sample set of queries that are expected to be
executed by the application.
There are three ways of getting index advisory from the backend:
i) pg_advise_index tool.
---------------------
Create a file that contains all the queries (semicolon terminated; may
be multi-line) that are expected to be executed by the application; and
feed this file to the pg_advise_index tool with appropriate options.
pg_advise_index -d DB -h host -U user -s 10M -o advisory.sql workload.sql
pg_advise_index will open a connection with the PostgreSQL server by
setting appropriate session level options that will force the backend to load
the pg_index_adviser plugin. It will then prepend the keywords EXPLAIN to each
of the queries found in the workload file, and execute them against the backend.
For each query EXPLAINed, the backend will generate advice for each index that
might have been beneficial in executing these queries.
At the end, pg_advise_index will enumerate all the indexes suggested for
the current session, and output the CREATE INDEX statements for each of them.
Optinally, if the -size option was specified, pg_advise_index will output suggestions
for only those indexes, that fit into that size.
ii) Manually (through psql session)
--------
If you wish to execute the queries manually, through psql, then you must
perform all the steps that pg_advise_index performs internally.
Connect to the database using psql. Make sure you set the environment variable PGOPTIONS for the backend to load the plugin for you. You can use a command similar to:
$env PGOPTIONS='-c local_preload_libraries=libpg_index_adviser' psql.exe db1
EXPLAIN every query that you think will be excuted by the
application. The Index Adviser will store the suggestions in the
advise_index table.
The EXPLAIN output will contain the normal plan and, if any indexes were
suggested, another plan with the suggested indexes dumped in the server log file
as a LOG output. If you wish to see these messages on your psql screen as well,
then you must set the client_min_messages GUC variable to at least LOG; like so:
set client_min_messages to log;
This new plan starts with a line like:
Index Adviser: Plan using estimates for suggested indexes:
When you are done EXPLAIN'ing all the queries, execute the following
pl/pgsql function to see the output. (Note: you must have created the
advise_index_show() function using the script available in this contrib module)
select advise_index_show( pg_backend_pid() );
If you wish, you can omit pg_backend_pid() function call inside the
parameter list, and invoke it as follows. Both these invocations have the same
effect:
select advise_index_show( null );
The advise_index_show() can also be helpful in enumerating advisory
generated by any previous runs of the Index Adviser; possibly other
sessions or even previous invocations of pg_advise_index tool.
For example, the following query will enumerate all the advices
generated till now:
select E'backend_pid\n'
|| E'===========\n'
|| backend_pid,
advise_index_show( backend_pid )
from (select distinct backend_pid
from advise_index as adv
where adv.reloid in (select oid
from pg_class
where relkind = 'r')
) as v;
iii) Automatically (at runtime, for every query executed by the application)
-----------------------------------------------------------------------
Just make sure that you set the environment varibale PGOPTIONS similar to:
'-c local_preload_libraries=libpg_index_adviser'
before you run your application. Either set this at the start of your
application, or set it in the postgresql.conf file (not recommended, but works).
This will result in the Index Adviser dumping it's advisory in the
advise_index table; and then you can use the above mentioned methods to
interpret the advisory.
4. Architecture
============
The Index Adviser consists of a plugin (.dll or .so file), a command line
tool and a pl/pgsql function. The planner calls the plugin, and that in turn
exploits virtual or hypothetical indexes. This kind of indexes are simulated
during the query planning only.
For an incoming EXPLAIN command, the planner generates the plan and, if the
plugin is loaded, then the query is sent to the Index Adviser for any
suggestions it can make. The Adviser derives a set of potentially-useful indexes
(index candidates) for this query by analyzing the query predicates. These
indexes are inserted into the system catalog as virtual indexes; that is, they
are not created on disk.
Then, the query is again sent to the planner, and this time the planner
makes it's decisions taking the just-created vitual indexes into account
too. All index candidates used in the final plan represent the
recommendation for the query and are inserted into the advise_index table
by the Adviser.
The gain of this recommendation is estimated by comparing the execution cost
difference of this plan to the plan generated before virtual indexes were
created.
Based on the index recommendations collected in the advise_index table the
optimal index configuration for a given set of workloads under disk space
restrictions can be determined by solving the knapsack problem. This is done by
the pg_advise_index tool with a -s (size) option.
5. The advise_index table
======================
The Index Adviser inserts it's advisory into a table named advise_index. This
table should be in the excuting user's search_path, and she should have INSERT
permission on it. The user can create this table using the
advise_index.create.sql script provided in this contrib module.
In this table, the index recommendations are collected for each query
running under control of the Index Adviser. The table is structured as
follows:
column | type | meaning
-------------+-----------+-------------------------------------------------
reloid | oid | the oid of the base table for this index
attrs | integer[] | an array containing the indexed column numbers
benefit | real | the estimated benefit of this index for this query
index_size | integer | the estimated size of the index (in disk-pages)
backend_pid | integer | pid of the backend to uniquely identify the source.
timestamp | timestamp | Can be used in conjunction with backend_pid.
Note: The benefit of an index is estimated as the fraction of the overall benefit
of all recommended index candidates for a given query
(i.e. overall_benefit = cost(without-virtual-indexes)
- cost(query-with-virtual-indexes))
based on the size of this index compared to the overall size of all indexes
recommended for this query.
Here's a sample of the contents of advise_index table:
select * from advise_index where backend_pid = pg_backend_pid();
reloid | attrs | benefit | index_size | backend_pid | timestamp
--------+-------+---------+------------+-------------+------------------------
16395 | {1} | 1782.72 | 2608 | 5256 | 2007-01-13 13:08:54.25
6. Backend source code modifications
=================================
This section gives a brief description of the modifiations made in
PostgreSQL backend. The following modifications were necessary:
.) Enable the planner.c to call a post-planner-plugin at the end of planner().
src\backend\optimizer\plan\planner.c
src\include\optimizer\planner.h
.) Enable the optimizer to get estimated number of pages of v-index from plugin.
src\backend\optimizer\util\plancat.c
.) Made a few misc. changes to help the Index Adviser, like a new function to
get string representation of a Plan, exporting a few global variables for the
Windows platform.
src\backend\commands\explain.c
src\include\commands\explain.h
src\include\utils\guc.h
src\include\miscadmin.h
7. Index Adviser Internals
=======================
1. If the plugin is loaded, then, at the end of the planner() function, backend
calls the main (Driver) function of the plugin: index_adviser(query, plan...);
where 'plan' is the already computed plan for the query.
2. Then, the query is analyzed (scan_query()) in order to get a list of
potential index candidates, i.e. indexes on columns used in query predicates or
for joining or grouping or sorting. This includes multi-column indexes used in
predicates containing AND conditions.
3. Next, all irrelevant candidates are removed
(remove_irrelevant_index_candidates()), e.g. indexes which already exist, or that
are trying to index system tables or temporary tables.
4. The remaining candidates are created as virtual indexes without populating
them with the data (create_virtual_indexes()).
5. Then, the planner is called and the costs are compared with those of the
previously estimated Plan.
6. If the cost difference exceeds a given limit, the plan is scanned
(scan_plan()) in order to extract the virtual indexes that were used in this
plan. The referenced virtual indexes represent the index recommendation.
7. The virtual indexes are removed from the catalog (drop_virtual_indexes()).
8. Finally, the benefit per index is estimated and the recommendation is written
to the advise_index table (save_advice()).
Note: A major portion of the Index Adviser runs inside a SubTransaction, so that
just rolling the transation back helps in easy reversal of all the catalog
changes that were made as a side effect of creating virtual indexes.
Also, as of now, the adviser creates only non-unique B-Tree virtual indexes.
i. Credits
=======
Kai-Uwe Sattler ([email protected]):
Design.
Multicolumn indexes
pg_advise (now called pg_advise_index tool)
Mario Stiffel ([email protected]):
Initial implementation. The algorithm worked as follows:
In ExplainOneQuery(), call indexadvisor() before normal call to planner().
indexadvisor() creates a copy of the Query* and runs planner() on it.
Save the costs returned by planner().
Invoke scan_query() on the Query* to create index candidates.
Prune irrelevant candidates using remove_irrelevant_index_candidates().
Create indexes using create_virtual_indexes(); these were real indexes.
Create another copy of the original Query* and call planner() on it.
Calculate the costs saved by using (supposedly) virtual indexes.
Call scan_plan() to mark the virtual indexes used in the new plan.
Call drop_virtual_indexes() to destroy the indexes previously created.
Remove the candidates from the list that are not marked as 'used'.
Call save_advice_to_catalog() to save the advisory (details about the
'used' candidates) into a catalog table named pg_indexadvisor.
Martin Luehring ([email protected]):
Improvements for multicolumn indexes.
Bug fixes for memory leaks.
Gurjeet Singh ([email protected]):
Ported code from 7.4.8 to 8.2.
Major code refactoring, to bring it in line with coding practices at PG.
Removed an extra call to planner() and an extra copy of Query*.
Added code to identify BitmapHeapScan, BitmapIndexScan, BitmapAnd, BitmapOr.
Now we also show the new plan with the suggested indexes in EXPLAIN output.
Now the index is not created on disk; no time spent in creating virt-index.
All the catalog changes are done and rolled-back within a sub-transaction.
Eliminated the need for a catalog table. advise_index can be a user table.
Added a keyword to EXPLAIN command: EXPLAIN ADVISE <query>. {reverted}
Removed all GUC variables; replaced with elog messages at DEBUG levels.
Added the ability to advise in execution (non-EXPLAIN, GUC var controlled).