This tool allows to extract data from a PostgreSQL database with greater flexibility that tools like pg_dump allow.
pg_seldump
reads one of more dump definitions from YAML files and selects
what tables or other database objects to save. It is possible to extract only
certain columns of the tables, only certain records, or to replace certain
values with a different expression, for instance to anonymize data.
The output of the program is a text file which can be used by psql to restore data into a database with a complete schema but with no data (or at least no conflicting data), e.g. using:
$ pg_seldump --dsn="dbname=sourcedb" datadump.yaml > dump.sql ... $ psql -1X --set ON_ERROR_STOP=1 -f dump.sql "dbname=targetdb"
Usage:
pg_seldump [-h] [--version] [--dsn DSN] [--outfile OUTFILE] [--test] [-q | -v] config [config ...] Create a selective dump of a PostgreSQL database. positional arguments: config yaml file describing the data to dump optional arguments: -h, --help show this help message and exit --version show program's version number and exit --dsn DSN database connection string [default: ''] --outfile OUTFILE, -o OUTFILE the file where to save the dump [default: stdout] --test test the configuration to verify it works as expected -q, --quiet talk less -v, --verbose talk more
The config
files must be YAML files containing a db_objects
list of
entries. Each entry may have:
Selectors (all the specified ones must match):
name
: name of the db object to dumpnames
: list of names or regex of db objects to dumpschema
: schema name of the db object to dumpschemas
: list of schema names or regexp to match schema names of the db object to dumpkind
: kind of object to match. Can be:table
sequence
paritioned table
materialized view
kinds
: list of kind of objects to match (like forkind
)adjust_score
: adjustment for the match score to break rules ties
Note
Sequences are selected automatically if they are used in default values by dumped tables.
Data modifiers:
action
: what to do with the matched object:dump
: dump the object in the output (default)skip
: don't dump the objecterror
: raise an error in case of match (useful to create strict descriptions where all the db objects must be mentioned explicitly)
no_columns
: list of columns names to omitfilter
: WHERE condition to include only a subset of the records in the dumpreplace
: mapping from column names to SQL expressions to replace values into the dump with somethings else
The objects in the database are matched to the rules in the config files. Every match will have a score according to how specific was the selector matched the object.
name
ornames
list: 1000names
regexp: 500schema
orschemas
list: 100schemas
regexp: 50kind
orkinds
: 10
The rule with the highest score will apply. If two rules have exactly the same
score the program will report an error: you can use adjust_score
to break
the tie.