- Update to pg_query_go v2.0.4
- Normalize: Don't touch "GROUP BY 1" and "ORDER BY 1" expressions, keep original text
- Fingerprint: Cache list item hashes to fingerprint complex queries faster (this change also significantly reduces memory usage for complex queries)
- Install script: Support CentOS in addition to RHEL
- Docker: Use Docker's USER command to set user, to support running as non-root
- This enables the collector container to run in environments that require the whole container to run as a non-root user, which previously was not the case.
- For compatibility reasons the container can still be run as root explicitly, in which case the setpriv command is used to drop privileges. setpriv replaces gosu since its available for installation in most distributions directly, and fulfills the same purpose here.
- Selfhosted: Support running log discovery with non-localhost db_host settings
- Previously this was prevented by a fixed check against localhost/127.0.0.1, but sometimes one wants to refer to the local server by a non-local IP address
- AWS: Add support for AssumeRoleWithWebIdentity
- This is useful when running the collector inside EKS in order to access AWS resources, as recommended by AWS: https://docs.aws.amazon.com/eks/latest/userguide/specify-service-account-role.html
- Statement stats retrieval: Get all rows first, before fingerprinting queries
- This avoids showing a bogus ClientWrite event on the Postgres server side whilst the collector is running the fingerprint method. There is a trade-off here, because we now need to retrieve all statement texts (for the full snapshot) before doing the fingerprint, leading to a slight increase in memory usage. Nonetheless, this improves debuggability, and avoids bogus statement timeout issues.
- Track additional meta information about guided setup failures
- Fix reporting of replication statistics for more than 1 follower
- Update to pg_query_go 2.0.2
- Normalize: Fix handling of two subsequent DefElems (resolves rare crashes)
- Redact primary_conninfo setting if present and readable
- This can contain sensitive information (full connection string to the primary), and pganalyze does not do anything with it right now. In the future, we may partially redact this and use primary hostname information, but for now, just fully redact it.
- Update to pg_query 2.0 and Postgres 13 parser
- This is a major upgrade in terms of supported syntax (Postgres 10 to 13), as well as a major change in the fingerprints, which are now shorter and not compatible with the old format.
- When you upgrade to this version of the collector you will see a break in statistics, that is, you will see new query entries in pganalyze after adopting this version of the collector.
- Amazon RDS: Support long log events beyond 2,000 lines
- Resolves edge cases where very long EXPLAIN plans would be ignored since they exceeded the previous 2,000 limit
- We now ensure that we go back up to 10 MB in the file with each log download that happens, with support for log events that exceed the RDS API page size limit of 10,000 log lines
- Self-managed: Also check for the process name "postmaster" when looking for Postgres PID (fixes data directory detection for RHEL-based systems)
- Docker builds: Increase stack size to 2MB to prevent rare crashes
- Alpine has a very small stack size by default (80kb) which is less than the default that Postgres expects (100kb). Since there is no good reason to reduce it to such a small amount, increase to usually common Linux default of 2MB stack size.
- This would have surfaced as a hard crash of the Docker container with error code 137 or 139, easily confused with out of memory errors, but clearly distinct from it.
- Reduce timeout for accessing EC2 instance metadata service
- Previously we were re-using our shared HTTP client, which has a rather high timeout (120 seconds) that causes the HTTP client to wait around for a long time. This is generally intentional (since it includes the time spent downloading a request body), but is a bad idea when running into EC2's IDMSv2 service that has a network-hop based limit. If that hop limit is exceeded, the requests just go to nowhere, causing the client to wait for a multiple of 120 seconds (~10 minutes were observed).
- Don't use pganalyze query marker for "--test-explain" command
- The marker means the resulting query gets hidden from the EXPLAIN plan list, which is what we don't want for this test query - it's intentional that we can see the EXPLAIN plan we're generating for the test.
- Add support for receiving logs from remote servers over syslog
- You can now specify the new "db_log_syslog_server" config setting, or "LOG_SYSLOG_SERVER" environment variable in order to setup the collector as a syslog server that can receive logs from a remote server via syslog to the server that runs the collector.
- Note that the format of this setting is "listen_address:port", and its recommended to use a high port number to avoid running the collector as root.
- For example, you can specify "0.0.0.0:32514" and then send syslog messages to the collector's server address at port 32514.
- Note that you need to use protocol RFC5424, with an unencrypted TCP connection. Due to syslog not being an authenticated protocol it is recommended to only use this integration over private networks.
- Add support for "pid=%p,user=%u,db=%d,app=%a,client=%h " and
"user=%u,db=%d,app=%a,client=%h " log_line_prefix settings
- This prefix misses a timestamp, but is useful when sending data over syslog.
- Log parsing: Correctly handle %a containing commas/square brackets
- Note that this does not support all cases since Go's regexp engine does not support negative lookahead, so we can't handle an application name containing a comma if the log_line_prefix has a comma following %a.
- Ignore CSV log files in log directory #83
- Some Postgres installations are configured to log both standard-format log files and CSV log files to the same directory, but the collector currently reads all files specified in a db_log_location, which works poorly with this setup.
- Tweak collector sample config file to match setup instructions
- Improvements to "--discover-log-location"
- Don't keep running if there's a config error
- Drop the log_directory helper command and just fetch the setting from Postgres
- Warn and only show relative location if log_directory is inside the data directory (this requires special setup steps to resolve)
- Improvements to "--test-logs"
- Run privilege drop test when running log test as root, to allow running "--test-logs" for a complete log setup test, avoiding the need to run a full "--test"
- Update pg_query_go to incorporate memory leak fixes
- Check whether pg_stat_statements exists in a different schema, and give a clear error message
- Drop support for Postgres 9.2
- Postgres 9.2 has been EOL for almost 4 years
- Update to Go 1.16
- This introduces a change to Go's certificate handling, which may break certain older versions of Amazon RDS certificates, as they do not include a SAN. When this is the case you will see an error message like "x509: certificate relies on legacy Common Name field".
- As a temporary workaround you can run the collector with the GODEBUG=x509ignoreCN=0 environment setting, which ignores these incorrect fields in these certificates. For a permanent fix, you need to update your RDS certificates to include the correct SAN field: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/UsingWithRDS.SSL-certificate-rotation.html
- Config parsing improvements:
- Fail fast when pganalyze section is missing in config file
- Ignore duplicates in db_name config setting
- Previously this could cause malformed snapshots that would be submitted correctly but could not be processed
- Validate db_url parsing to avoid collector crash with invalid URLs
- Include pganalyze-collector-setup program (see 0.35 release notes) in supported packages
- Rename
<unidentified queryid>
query text placeholder to<query text unavailable>
- This makes it clearer what the underlying issue is
- Revert to using
<truncated query>
instead of<unparsable query>
in some situations- When a query is cut off due to pg_stat_activity limit being reached,
show
<truncated query>
, to make it clear that increasing track_activity_query_size would solve the issue
- When a query is cut off due to pg_stat_activity limit being reached,
show
- Ignore I/O stats for AWS Aurora utility statements
- AWS Aurora appears to report incorrect blk_read_time and blk_write_time values for utility statements (i.e., non-SELECT/INSERT/UPDATE/DELETE); we zero these out for now
- Fix log-based EXPLAIN bug where query samples could be dropped if EXPLAIN failed
- Add U140 log event (inconsistent range bounds)
- e.g.: ERROR: range lower bound must be less than or equal to range upper bound
- Fix issue where incomplete schema information in snapshots was not marked correctly
- This could lead to schema objects disappearing and being re-created
- Fix trailing newline handling for GCP and self-hosted log streams
- This could lead to queries being poorly formatted in the UI, or some queries with single-line comments being ignored
- Include additional collector configuration settings in snapshot metadata for diagnostics
- Ignore "insufficient privilege" queries w/o queryid
- Previously, these could all be aggregated together yielding misleading stats
- Add new "pganalyze-collector-setup" program that streamlines collector installation
- This is initially targeted for self-managed servers to make it easier to set up the collector and required configuration settings for a locally running Postgres server
- To start, this supports the following environments:
- Postgres 10 and newer, running on the same server as the collector
- Ubuntu 14.04 and newer
- Debian 10 and newer
- Collector test: Show server URLs to make it easier to access the servers in pganalyze after the test
- Collector test+reload: In case of errors, return exit code 1
- Ignore manual vacuums if the collector can't access pg_stat_progress_vacuum
- Don't run log test for Heroku, instead provide info message
- Also fixes "Unsupported log_line_prefix setting: ' sql_error_code = %e '" error on Heroku Postgres
- Add pganalyze system user to adm group in Debian/Ubuntu packages
- This gives the collector permission to read Postgres log files in a default install, simplifying Log Insights setup
- Handle NULL parameters for query samples correctly
- Add a skip_if_replica / SKIP_IF_REPLICA option (#117)
- You can use this to configure the collector in a no-op mode on replicas (we only query if the monitored database is a replica), and automatically switch to active monitoring when the database is no longer a replica.
- Stop building packages for CentOS 6 and Ubuntu 14.04 (Trusty)
- Both of these systems are now end of life, and the remaining survivor of the CentOS 6 line (Amazon Linux 1) will be EOL on December 31st 2020.
- Check and report problematic log collection settings
- Some Postgres settings almost always cause a drastic increase in log volume for little actual benefit. They tend to cause operational problems for the collector (due to the load of additional log parsing) and the pganalyze service itself (or indeed, likely for any service that would process collector snapshots), and do not add any meaningful insights. Furthermore, we found that these settings are often turned on accidentally.
- To avoid these issues, add some client-side checks in the collector to disable log processing if any of the problematic settings are on.
- The settings in question are:
- log_min_duration_statement less than 10ms
- log_statement set to 'all'
- log_duration set to 'on'
- log_error_verbosity set to 'verbose'
- If any of these are set to these unsupported values, all log collection will be disabled for that server. The settings are re-checked every full snapshot, and can be explicitly re-checked with a collector reload.
- Log Insights improvements
- Self-managed server: Process logs every 3 seconds, instead of on-demand
- Self-managed server: Improve handling of multi-line log events
- Google Cloud SQL: Always acknowledge Pub Sub messages, even if collector doesn't handle them
- Optimize stitching logic for reduced CPU consumption
- Explicitly close temporary files to avoid running out of file descriptors
- Multiple changes to improve debugging in support situations
- Report collector config in full snapshot
- This reports certain collector config settings (except for passwords/keys/credentials) to the pganalyze servers to help with debugging.
- Print collector version at beginning of test for better support handling
- Print collection status and Postgres version before submitting snapshots
- Change panic stack trace logging from Verbose to Warning
- Report collector config in full snapshot
- Add support for running the collector on ARM systems
- Note that we don't provide packages yet, but with this the collector can be built on ARM systems without any additional patches.
- Introduce API system scope fallback
- This fallback is intended to allow changing the API scope, either based on user configuration (e.g. moving the collector between different cloud provider accounts), or because of changes in the collector identify system logic.
- The new "api_system_scope_fallback" / PGA_API_SYSTEM_SCOPE_FALLBACK config variable is intended to be set to the old value of the scope. When the pganalyze backend receives a snapshot with a fallback scope set, and there is no server created with the regular scope, it will first search the servers with the fallback scope. If found, that server's scope will be updated to the (new) regular scope. If not found, a new server will be created with the regular scope. The main goal of the fallback scope is to avoid creating a duplicate server when changing the scope value
- Use new fallback scope mechanism to change scope for RDS databases
- Previously we identified RDS databases by there ID and region only, but the ID does not have to be unique within a region, it only has to be unique within the same AWS account in that region. Thus, adjust the scope to include both the region and AWS Account ID (if configured or auto-detected), and use the fallback scope mechanism to migrate existing servers.
- Add support for GKE workload identity Yash Bhutwala #91
- Add support for assuming AWS instance roles
- Set the role to be assumed using the new
aws_assume_role
/AWS_ASSUME_ROLE
configuration setting. This is useful when the collector runs in a different AWS account than your database.
- Set the role to be assumed using the new
- Ignore internal admin databases for GCP and Azure
- This avoids collecting data from these internal databases, which produces unnecessary errors when using the all databases setting.
- Add log_line_prefix check to GCP self-test
- Schema stats handling: Avoid crash due to nil pointer dereference
- Add support for "%m [%p]: [%l-1] db=%d,user=%u " log_line_prefix
- Add helper for log-based EXPLAIN access and use if available
- This lets us avoid granting the pganalyze user any access to the data to follow the principle of least privilege
- See https://github.com/pganalyze/collector#setting-up-log-explain-helper
- Avoid corrupted snapshots when OIDs get reused across databases
- This would have shown as data not being visible in pganalyze, particularly for servers with many databases where tables were dropped and recreated often
- Locked relations: Ignore table statistics, handle other exclusive locks
- Tables being rewritten would cause the relation statistics query to fail due to statement timeout (caused by lock being held)
- Non-relation locks held in AccessExclusiveLock mode would cause all relation information to disappear, but only for everything thats not the top-level relation information. This is due to the behaviour of NOT IN when the list contains NULLs (never being true, even if an item doesn't match the list). The top-level relation information was using a LEFT JOIN that doesn't suffer from this problem. This likely caused problems reported as missing index information, or indices showing as being recently created even though they've exited for a while.
- Improvements to table partitioning reporting
- Enable additional settings to work correctly when used in Heroku/Docker
- DB_NAME
- DB_SSLROOTCERT_CONTENTS
- DB_SSLCERT_CONTENTS
- DB_SSLKEY_CONTENTS
- Add
ignore_schema_regexp
/IGNORE_SCHEMA_REGEXP
configuration option- This is like ignore_table_pattern, but pushed down into the actual stats-gathering queries to improve performance. This should work much better on very large schemas
- We use a regular expression instead of the current glob-like matching since the former is natively supported in Postgres
- We now warn on usage of the deprecated
ignore_table_pattern
field
- Add warning for too many tables being collected (and recommend
ignore_schema_regexp
) - Allow keeping of unparsable query texts by setting
filter_query_text: none
- By default we replace everything with
<unparsable query>
(renamed from the previous<truncated query>
for clarity), to avoid leaking sensitive data that may be contained in query texts that couldn't be parsed and that Postgres itself doesn't mask correctly (e.g. utility statements) - However in some situations it may be desirable to have the original query texts instead, e.g. when the collector parser is outdated (right now the parser is Postgres version 10, and some newer Postgres 12 query syntax fails to parse)
- To support this use case, a new "filter_query_text" / FILTER_QUERY_TEXT option is introduced which can be set to "none" to keep all query texts.
- By default we replace everything with
- EXPLAIN plans / Query samples: Support log line prefix without %d and %u
- Whilst not recommended, in some scenarios changing the log_line_prefix is difficult, and we want to make it easy to get EXPLAIN data even in those scenarios
- In case the log_line_prefix is missing the database (%d) and/or the user (%u), we simply use the user and database of the collector connection
- Log EXPLAIN: Run on all monitored databases, not just the primary database
- Add support for stored procedures (new with Postgres 11)
- Handle Postgres error checks using Go 1.13 error helpers
- This is more correct going forward, and adds a required type check for the error type, since the database methods can also return net.OpError
- Fixes "panic: interface conversion: error is *net.OpError, not *pq.Error"
- Collect information on table partitions
- Relation parents as well as partition boundary (if any)
- Partitionining strategy in use
- List of partitioning fields and/or expression
- Log Insights: Track TLS protocol version as a log line detail
- This allows verification of which TLS versions were used to connect to the database over time
- Log Insights: Track host as detail for connection received event
- This allows more detailed analysis of which IPs/hostnames have connected to the database over time
- Example collector config: Use collect all databases option in the example
- This improves the chance that this is set up correctly from the beginning, without requiring a backwards incompatible change in the collector
- Add Log Insights support for Azure Database for PostgreSQL
- Log Insights: Avoid unnecessary "Timeout" error when there are other failures
- Log EXPLAIN: Don't run EXPLAIN logic when there are no query sample
- Improve non-fatal error messages to clarify the collector still works
- Log grant failure: Explain root cause better (plan doesn't support it / fair use limit reached)
- Track local replication lag in bytes
- RDS: Handle end of log files correctly
- High-frequency query collection: Avoid race condition, run in parallel
- This also resolves a memory leak in the collector that was causing increased memory usage over time for systems that have a lot of pg_stat_statements query texts (causing the full snapshot to take more than a minute, which triggered the race condition)
- Package builds: Use Golang 1.14.3 patch release
- This fixes golang/go#37436 which was causing "mlock of signal stack failed: 12" on Ubuntu systems
- Switch to simpler tail library to fix edge case bugs for self-managed systems
- The hpcloud library has been unmaintained for a while, and whilst the new choice doesn't have much activity, in tests it has shown to work better, as well as having significantly less lines of code
- This also should make "--test" work reliably for self-managed systems (before this returned "Timeout" most of the time)
- Index statistics: Don't run relation_size on exclusively locked indices
- Previously the collector was effectively hanging when it encountered an index that has an ExclusiveLock held (e.g. due to a REINDEX)
- Add another custom log line prefix: "%m %r %u %a [%c] [%p] "
- RDS fixes
- Fix handling of auto-detection of AWS regions outside of us-east-1
- Remember log marker from previous runs, to avoid duplicate log lines
- Add support for Postgres 13
- This adds support for running against Postgres 13, which otherwise breaks due to backwards-incompatible changes in pg_stat_statements
- Note that there are many other new statistics views and metrics that will be added separately
- Add "db_sslkey" and "db_sslcert" options to use SSL client certificates
- Add Ubuntu 20.04 packages
- Update to Go 1.14, latest libpq
- Ensure that we set system type correctly for Heroku full snapshots
- Detect cloud providers based on hostnames from DB_URL / db_url as well
- Previously this was only detected for the DB_HOST / db_host setting, and that is unnecessarily restrictive
- Note that this means your instance may show up under a new ID in pganalyze after upgrading to this version
- Log Explain
- Ignore pg_start_backup queries
- Support EXPLAIN for queries with parameters
- Log Insights improvements
- Experimental: Google Cloud SQL log download
- Remove unnecessary increment of log line byte end position
- Make stream-based log processing more robust
- Add direct "http_proxy" & similar collector settings for Proxy config
- This avoids problems in some environments where its not clear whether the environment variables are set. The environment variables HTTP_PROXY, http_proxy, HTTPS_PROXY, https_proxy, NO_PROXY and no_proxy continue to function as expected.
- Fix bug in handling of state mutex in activity snapshots
- This may have been the cause of "unlock of unlocked mutex" errors when having multiple servers configured.
- Activity snapshot: Track timestamp of previous activity snapshot
- Support setting custom AWS endpoints using environment variables
- Increase allowed characters for pid field to 7 (for 64-bit systems)
- This supports environments where pid_max is set to 4194304 instead of 32768
- Note that this means continuity with old backend/vacuum identities is lost i.e. data might show up incorrectly for existing processes after the upgrade
- Add new wait events from Postgres 12
- Avoid unnecessary allocations in ReplaceSecrets function
- Rename "aws_endpoint_rds_signing_region" to "aws_endpoint_signing_region"
- This more accurately reflects how the setting is used. Backwards compatibility is provided, but its recommended to migrate to the new config setting (when in use for custom AWS API endpoints)
- Update rds-ca-2019-root.pem file to be correct certificate
- This was identical to the 2015 certificate by accident, causing connection errors
- Add new "log_explain" mode, as an alternative to auto_explain (experimental)
- Enable by setting "enable_log_explain: 1" or "PGA_ENABLE_LOG_EXPLAIN=1"
- This is intended for providers such as Heroku Postgres where you can't use the auto_explain extension to send EXPLAIN plans into pganalyze
- Add support for Azure Database for PostgreSQL
- Add support for Google Cloud SQL
- Use pg_has_role to determine pg_monitor membership
- This improves handling of nested memberships, which previously were not detected correctly
- Generalize almost-superuser detection to support Azure and Cloud SQL better
- Add support for running "--test --reload" to test and reload if successful
- This makes it easier to not forget reloading the collector after making a change
- Vacuum progress: Ignore "(to prevent wraparound)" in query text
- Update distributions for packaging to reflect current versions
- Remove Ubuntu Precise (its been EOL for 2 years)
- Remove Fedora 24 (its been EOL for a while)
- Add Fedora 29
- Add Fedora 30
- Add RHEL8
- Add Debian 10 ("Buster")
- Import RDS 2019 CA root certificate
- This is now available by specifying "db_ssl_root_cert = rds-ca-2019-root"
- Update builds and tests to use Go 1.13
- Allow HTTP-only proxy connections when specified by the user
- Allow all replication LSN fields to be null
- Add full context for pg_stat_statements error messages
- Docker: Use entrypoint, provide easy "test" command, hide timestamps
- Amazon RDS
- Add support for custom AWS endpoints
- Include RDS root certificate for Docker builds
- Automatically detect RDS instance ID from Docker env variables as well
- Allow the ECS task metadata service
- Show verbose AWS credentials chain errors
- Self-hosted: Ignore additional file system types that are not important
- Fix helper process when using systemd service
- This unfortunately requires us to remove the "NoNewPrivileges" mode, since we intentionally use a setuid binary (pganalyze-collector-helper) to be able to discover the data directory as well as determine the size of the WAL directory
- Increase systemd service memory limit to 1GB
- We've previously limited this to 256MB for all use cases, which is too small when monitoring multiple systems
- Security: Lock down permissions for /etc/pganalyze-collector.conf
- Previously this was world-readable, which may make credentials accessible to more system users than intended
- Upgrading the packages will also apply this fix retroactively
- Allow full snapshots to run when pg_stat_statements is not fully enabled
- This provides for a smoother onboarding experience, as you can use the collector even if pg_stat_statements is not (yet) enabled through shared_preload_libraries
- RDS integration improvements
- Don't attempt to connect to rdsadmin database
- Accept rds_superuser as superuser for monitoring purposes
- Correctly handle enhanced monitoring disk partitions
- Self-hosted system helper: Explicitly look for process called "postgres"
- This avoids issues where there is another process that starts postgres and runs earlier than the actual postgres process
- Add support for compact snapshots saving to local files
- Enable logs and activity snapshots by default
- Note that they are disabled when requested by server, to avoid overwhelming the server with compact snapshot grants
- Reduce memory consumption by only storing the required query texts
- This also introduces two additional special query texts that get sent:
- "<pganalyze-collector>" which identifies internal collector queries
- "<insufficient privilege>" which identifies permission errors
- This also introduces two additional special query texts that get sent:
- The collector now always normalizes query texts directly after retrieval
- Only display "You are not connecting as superuser" message during tests
- Config: Fix crash when configuration is read from environment only
- Add Postgres 12 support
- Process each configured server section in parallel
- This avoids problems when a high number of servers is configured, since previously they would be processed serially, leading to skewed statistics for servers processed later in the sequence
- Introduce log filtering for PII and other kinds of secrets
- This is controlled by the new "filter_log_secret" configuration setting
- Remove explicit connection to EC2 metadata service #27
- Gather total partitioned/inheritance children table size
- Correctly retrieve distributed table size for Citus extension tables
- Ensure connections are encrypted and made using TLS 1.2
- Build improvements
- Update builds and tests to use Go 1.12
- Switch to new Go module system instead of gvt
- Vacuum monitoring
- Filter out results with insufficient privileges
- Previously we would error out hard in this case, which isn't helpful and can stop the usage of activity snapshots on shared systems
- Correctly close DB connection on error
- Filter out results with insufficient privileges
- Connection establishment: Make sure to close connection on early errors
- Add support for "%t [%p]: [%l-1] [trx_id=%x] user=%u,db=%d " log_line_prefix
- Add LOG_LOCATION environment config variable (same as db_log_location)
- Note: We don't support the equivalent of the experimental setting db_log_docker_tail since it would require the "docker" binary inside the pganalyze container (as well as full Docker access), instead the approach for using pganalyze as a sidecar container alongside Postgres currently requires writing to a file and then mounting that as a volume inside the pganalyze container
- TOAST handling
- Track size of TOAST table separately
- This can often be useful to determine whether the bulk of a table is in TOAST storage, or in the main storage, and thus reads may behave slightly differently
- Fix bug in detection of HasToast for tables
- Previously we recorded this the wrong way around, i.e. tables that had no TOAST would have been flagged as having TOAST. This hasn't been used thusfar in terms of stats processing, but might be in the future, so better to have this correct
- Track TOAST flag for autovacuum and buffercache statistics
- Track size of TOAST table separately
- Schema-qualify functions/tables wherever possible
- Whilst not much of problem in practice, since the collector doesn't run as superuser, it doesn't hurt to schema-qualify everything
- This also introduces an explicit "SCHEMA public" for CREATE EXTENSION statements to support non-standard search paths better
- Extract schema/relation name from autovacuum log events
- This is done to make it easier to link autovacuum log events to the corresponding vacuum statistics records
- Include partitioned base tables in the table information gathered
- Historic statement stats: Ignore any data older than 1 hour
- There has been some cases where the state structure doesn't get reset and the historic statement stats keeps growing and growing. Add this as a safety measure to ensure a run can complete successfully
- Fix scoping of on-disk state to reflect system type/scope/id
- Previously we only considered API key to determine which state to save, which meant that in configurations with multiple servers but a single organization API key we'd usually loose statistics on restarts, or get the wrong values for diff-ing the query statistics
- This is a backwards-incompatible change for the on-disk format and therefore bumps the version from 1 to 2. Effectively this will show as one period of no data after upgrading to this version, as the previously saved counter values in the state file won't be used
- Add support for "%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h " log_line_prefix
- Use explicit log file clean up instead of deferrals
- There have been reports of old temporary files containing log lines not being cleaned up fully - attempt to fix that
- systemd config: Fix incorrect specification of memory limit & restart event Dom Hutton #26
- Fix supported log_line_prefix list to include recently added prefixes
- Add additional supported log_line_prefix settings
- '%t [%p]: [%l-1] user=%u,db=%d - PG-%e '
- Add new log classifications
- WAL_BASE_BACKUP_COMPLETE
- SERVER_STATS_COLLECTOR_TIMEOUT
- Add "sorry, too many clients already" out of connections log classification
- Add additional supported log_line_prefix settings
-
- '%m [%p] %q[user=%u,db=%d,app=%a] '
-
- '%m [%p] %q[user=%u,db=%d,app=%a,host=%h] '
-
- Add support for [local] when using %r in log_line_prefix
- Make --discover-log-location work when using monitoring user
- Correctly detect pg_wal directory for Postgres 10 and newer
- Introduce setting for maximum collector connections
- This previously existed as a hard-coded 5 connection maximum based on the pganalyze-collector application name in pg_stat_activity
- Adds "max_collector_connections" configuration setting to override
- Increases default max connections to 10 to better support activity snapshots
- Writes an error to the log instead of panicing when limit is reached
- Add experimental support for Docker log monitoring
- Adds "db_log_docker_tail" setting to specify the container name
- Allows monitoring the logs of a Postgres instance running inside Docker, when running the collector outside (on the Docker Host)
- Add support for Postgres 11
- Support log_hostname=1 when using log_line_prefix that contains %r
- Duplicate config detection: Differentiate by API key and API base url
- Add configuration setting to disable high-frequency query statistics
- Log parser: Add support for offset-based timezone identifier
- Previously we assumed that Postgres always outputs the timezone name, that is not correct for some timezones, where instead an offset like "+0100" would be printed.
- Fixes to experimental sequences report
- Document monitoring helper functions for sequences report
- Introduce once-per-minute query statistics collection
- Enabled for PostgreSQL 9.4+
- This replaces the old logic for query stats without text (statement_text_frequency), and is always active
- Statistics data gets sent with every full snapshot
- Backend counts: Support retrieving stats for Postgres 9.5 and older
- Log Insights: Add support for detecting aggressive vacuums (Postgres 11+)
- Parse serialization failure log events (U138 and U139)
- Have systemd restart the collector after crashes Dom Hutton #23
- Don't error out on pg_stat_replication.replay_lsn being NULL #21
- Support basic vacuum information for Postgres 9.5 and older
- Track connection counts per database and per role
- Add ignore_table_pattern / IGNORE_TABLE_PATTERN option
- Avoid errors when collecting from AWS Aurora for Postgres instances
- Log Insights improvements
- Better setup help
- Increase read rate for RDS log downloader from 100 to 1000 lines
- Add support for extracting EXPLAIN plans from auto_explain output
- Fix autovacuum, autoanalyze and checkpoint completed parsing for PG 10
- Avoid sending unparsed explain text for truncated log lines
- Detect vacuum/analyze skipped log lines
- Set username/database name correct for Heroku logstreams
- Support additional ryslog format that contains user/db/app information
- Update to Go 1.10
- Add support for Log Insights on self-hosted systems
- Add additional log classifications, speed up analysis by reusing regexps
- Add "--analyze-logfile" option to test collector with local logfiles
- Associate truncated queries to the correct error fingerprint
- Update to Go 1.9.3
- Update pg_query_go to Postgres 10 and fingerprint version 2
- This is a breaking change in collector output, as queries will now be fingerprinted differently
- Activity snapshots
- Use pg_stat_activity helper when it exists
- Track VACUUM progress in activity snapshots
- Activity data: Ignore backends that are not visible to the user
- Allow additional digits for PID in pg_stat_activity Joseph Bylund
- Don't collect backend data for full snapshot anymore, this is all delegated to activity snapshots now
- Update to Go 1.9.2
- RDS pgss check: Add additional safety against nil pointer dereferences
- Logs: Fix regexp for 9.5 vacuum output (skip pins, but not skip frozen)
- Update to Go 1.9.1 release
- Allow disabling collection of relation/schema information
- Add experimental activity snapshots
- This is not for public consumption yet, and trying to use it will result in an error from the server - but watch this space :)
- Support for Postgres 10 monitoring role
- Log Insights improvements
- Fix bug where referenced query wouldn't be correctly identified
- Collect query text and parameters for all query samples
- Fix issues with Heroku Postgres log collection
- Update pg_query_go / libpg_query to 9.5-1.6.2
- Updates the query fingerprinting logic to avoid seeing different FETCH/DECLARE/CLOSE cursor names as unique queries - statistics on this are not going to be useful in most cases, and will clog the processing pipeline
- Updates the query fingerprinting logic to ignore the table name for CREATE TEMPORARY TABLE
- Updates the query fingerprinting logic to better handle the values list for INSERT statements to group complex, but similar statements together
- Support specifying db_sslmode=verify-full and passing certificate information
using db_sslrootcert / db_sslrootcert_contents
- The collector packages now also ship a set of known DB-as-a-Service CA certificates, starting with the often needed rds-ca-2015-root certificate (just pass that term instead of a path to db_sslrootcert)
- Support for Postgres 10
- Heroku: Support specifying configuration name in log drain endpoints
- RDS: Ensure to delete temporary log files quickly after they are submitted
- Add support for Heroku logdrains
- Log Monitoring
- Upload encrypted log data to S3, and only send byte ranges in snapshot
- Implement log classification
- Add --version flag to show current collector version
- Replication stats: Allow replay location to be null
- Add support for error and success callbacks
- Introduce server-controlled ability to reset pg_stat_statements
- SystemScope: Include DbAllNames status for local collections
- Fix wording of some log messages
- Refactor log collection and query sampling / explaining
- Introduce ability to collect statement text less often
- Make statement timeout a server-controlled option
- Allow enabling/disabling automatic EXPLAIN from server-side
- Fix collection of replication statistics for non-superusers
- Add monitoring helper for replication statistics
- Update to Go 1.8 in all builds
- Disable verbose logging on Heroku
- Add SystemID for all types of systems
- Change default config to be in account-based format
- Support collecting schema info/stats from multiple databases per server
- Allow monitoring all databases using DB_ALL_NAMES=1 env variable
- Fix issue with helper having wrong executable format
- Cleanup test mechanism in test/ folder
- Handle null relation sizes for temp tables
- Collect replication statistics
- Fix edge case that made RDS system metrics code crash
- Add VACUUM and Sequence reports
- Make bloat report work under the restricted user
- Add option to run a Go performance trace on a single test run
- Improve error tracking
- Update pg_query_go
- Cut off fingerprints at 100 nodes deep to avoid excessive runtimes/memory
- Prevent leaks of previous scheduler runs when reloading.
- New Heroku support based on user API keys
- Support for new Reports feature (in private beta right now)
- Add PGA_ALWAYS_COLLECT_SYSTEM_DATA to force collection of system data
- Increase statement timeout to 30 seconds to account for some larger databases
- Support for writing snapshots to local filesystem (needed by pganalyze Enterprise)
- Improved first user experience
- Add "--reload" command for sending SIGHUP to daemon process
- Show error message when configuration file is empty
- Experimental build support for Solaris
- System metrics: Various fixes
- Packaging: Add support for Ubuntu Precise / 12.04 LTS
- Introduce "pganalyze-collector-helper": Setuid Binary that can be used to run privileged actions when the main collector is running as non-root (the default)
- Determine the correct distance between two collector runs (instead of assuming 600 seconds)
- Better monitoring for self-hosted systems
- Collect missing Disk I/O statistics
- Fix calculation logic for disk utilization
- Collect kernel version and architecture
- Don't monitor the local loopback network interface
- Sort disk/partition/network interface names before output
- Don't collect local system information when monitoring remote hosts
- Packaging
- Update to Go 1.7
- Systemd: Enforce memory limit of 256mb for the collector
- Correctly identify PostgreSQL data directory and pg_xlog location
- Avoid potential NaN values in disk stats for self-hosted systems
- Don't write state file for dry runs by default
- PostgreSQL 9.2, 9.3 and 9.6 Support
- Adjust default config and state file path to match packages
- Allow using postgres driver default values for connection credentials
- Add support for logging to syslog instead of stderr
- Init scripts for systemd, upstart and sysvinit (see contrib/ directory)
- Packaging scripts for common Linux distributions (see packages/ directory)
- First official release of new protocol buffers-based collector
- Significant restructuring of the codebase
- We're now sending data using the protocol buffers format
- Snapshot data is directly uploaded to S3
- Query, table and system statistics are diff-ed on the client side
- Support for monitoring system metrics on self-hosted systems is added again
- New safety mechanisms against stuck/slow collector runs
- Add support for RDS enhanced monitoring
- Simplify dependencies and document OSS licenses in use
- Bugfixes for AWS Instance Role handling
- Introduce new --diff-statements option (default off for now)
- This calculates the diff for the counter values of pg_stat_statements on the client (i.e. collector), instead of the server for increased accuracy and protection against out-of-order processing
- Introduce "opts" to the snapshot, for indicating which options were chosen
- Never open more than 1 connection to the same database
- This covers edge cases like sending a lot of SIGHUP signals
- In case we detect more than 1 connection we error out and exit, to avoid clogging the database
- Use AWS EC2 instance role if no credentials are specified
- Ensure pg_toast schema is excluded when calculating index bloat
- Send Postgres version to the server as well
- Add --test mode to ease initial setup
- Don't do an initial run when daemonizing (the default), this is mostly so we can keep a clean schedule and prevent issues if the config is temporarily wrong, or the server is unreachable for some reason
- Use POSIX commandline flags (double dash instead of single dash, shorthand flags) instead of Go's flag approach
- Added support for specifying sslmode when connecting, and default to "prefer"
- This also fixes an issue where beforehand we required SSL to always be present
- For maximum security you might want to set this to "require" or "verify-full"
- Re-introduced all required statistics currently used by pganalyze
- Initial Go re-release
- The collector now runs as a daemon (instead of through the crontab)
- We optionally write a pidfile, which you can use to SIGHUP for config changes
- You can specify multiple databases in the configuration file
- Support for fetching Amazon Web Service RDS data (CloudWatch and log files)
Changelog of original Python-based collector:
- Compress data using zlib by default (disable with --no-compression)
- Collect function definitions (disable with --no-postgres-functions)
- Collect normal and materialized views
- Update table/index bloat queries to use newest by pgExperts/Josh Berkus
- Disable bloat stats collection with --no-postgres-bloat
- Output timing information in verbose mode
- Added Dockerfile to enable running collector as a sidekick service
- Improved monitoring user support
- Add support for collecting backend info as restricted user
- Gracefully fail if we are not superuser
- psycopg2: Fix bugs, set connection timeout (10s)
- pg8000: Default to SSL connections, fallback to non-SSL
- Reset-less collector
- Calculate the diff on the receiver end for simplicity's sake
- Don't run pg_stat_statements_reset() anymore
- Make --no-reset option a no-op (will be removed soon)
- Restricted privileges
- Added support for using a monitoring user (see README)
- Removed hard superuser requirement
- Don't collect query information from pg_stat_activity (it might be sensitive)
- Remove --no-query-parameters option, its a no-op now
- Collect replication statistics
- Add option for disabling collection of postgres locks & config settings
- Remove support for pg_stat_plans, its not supported anymore
- Update vendored pg8000 to latest version (1.10.1)
Fixes:
- Do not require dbhost to be set
Re-release to fix build issues
Fixes:
- CREATE EXTENSION IF NOT EXISTS pg_stat_statements
- Auto-detect Amazon Web Service DB hosts as remote
Fixes:
- Drop dmidecode dependency to gather server vendor/model on Linux
- Fix whitespace in generated configuration file
Features & improvements:
- support for pg8000 as psycopg2 replacement
- collector can now be run from a zip file
- Split collector into separate modules
- import pg8000 & colorama
- Add conversation module for setup wizard
- Support for packaging to deb & rpm using fpm
- Move zip building to Makefile
Fixes:
- Don't replace newlines in collected queries with whitespace
- Ignore queries belonging to other databases
- pg_stat_statements support
- Switch from psql wrapper to psycopg2
- Collect CPU information from OS
- Ignore queries created by the collector
- Collecting more Postgres Information
- GUCs (configuration settings)
- BGWriter
- backends
- locks
- Collect information about Postgres schema
- Tables
- Indexes
- Bloat
- Switch to Python
- dry-run mode - see data before it's posted
- privacy mode - don't send query examples to API
- Collect OS (CPU, memory, storage) information
- Small fixes to config parsers and plan fetching
- dry-run mode
- Initial release of the Ruby Collector
- Support for fetching information from pg_stat_plans