(postgre)SQL export support / TimeScaleDB #2814

oliv3r · 2024-06-03T20:27:28Z

Is your feature request related to a problem? Please describe.
I've already got a postgres database running. Having to learn, setup and maintain an influxdb just for glances seems a bit frustrating to do.

Describe the solution you'd like
Since there's generic python libraries that handle a multitude of SQL backends, this would be something nice to have. As for 'timeseries databases don't fit traditional databases' that is quite true, however there is also 'timescaledb', which is an extension to postgres, that should be able to handle this just fine. So a bit of extra effort might be needed to support timescaledb, instead of regular postgres.

Describe alternatives you've considered
Setting up influxdb :(

nicolargo · 2024-06-08T08:50:35Z

Great proposal.

I will have a look on the TimescaleDB:

nicolargo · 2024-06-09T07:39:16Z

@oliv3r most of the time, Glances exports data to NoSQL databases (InfluxDB, ELK, MongoDB...). Before inserting data into TimeScale/Postgre, Glances needs to create a database and a relational table. I am a little bit confuse about this data model.

I think that the best way is to create one table per plugin. The documentation also talk about hypertable, perhaps more adapted for the Glances data ?

Can you give me a simple example of CREATE TABLE command for the following plugin ?

cpu (a key/value dictionary): {'total': 0.0, 'user': 2.0, 'nice': 0.0, 'system': 0.0, 'idle': 32.0, 'iowait': 0.0, 'irq': 0.0, 'steal': 0.0, 'guest': 0.0, 'ctx_switches': 0, 'interrupts': 0, 'soft_interrupts': 0, 'syscalls': 0, 'cpucore': 16, 'time_since_update': 2.406388282775879, 'ctx_switches_gauge': 91081182, 'interrupts_gauge': 64894442, 'soft_interrupts_gauge': 23251131, 'syscalls_gauge': 0}
diskio (a list of key/value dictionaries): [{'read_count': 0, 'write_count': 0, 'read_bytes': 0, 'write_bytes': 0, 'key': 'disk_name', 'disk_name': 'dm-0', 'time_since_update': 2.3856899738311768, 'read_count_gauge': 146671, 'write_count_gauge': 303436, 'read_bytes_gauge': 4313531392, 'write_bytes_gauge': 5372370944}, {'read_count': 0, 'write_count': 0, 'read_bytes': 0, 'write_bytes': 0, 'key': 'disk_name', 'disk_name': 'dm-1', 'time_since_update': 2.3856899738311768, 'read_count_gauge': 146609, 'write_count_gauge': 301750, 'read_bytes_gauge': 4312515584, 'write_bytes_gauge': 5372370944}]

oliv3r · 2024-06-12T09:49:45Z

I think that the best way is to create one table per plugin.

I have no idea what's best here. I always thought so too. But a) I noticed that their own 'stock-exchange' example actually uses a single table, but then this data is a bit correlated. While there's different stock symbols for different companies, the data is the same. Also mangaging things becomes different, because when a new symbol is added/removed, you have to create/drop a table. IMO it still makes logical sense. But then I know that home-assistant also puts all its sensor data into a single table. This is puzzeling for me still. Because then the data is not correlated at all. I would expect that each sensor has its own table. And that extends to here as well. Each plugin/sensor should have its own table. There surely must be a performance benefit here. Asking AI, it also states, performance should be better on multiple tables, with the downside that it's more work to manage, but the upside that related queries on a single timestamp might be faster, if you store each plugin in its own column. But I cant' figure out why the single table option would be better.

Regardless, while you can store json directly in postgres, that's probably not what you have in mind ;)

CPU

CREATE TABLE "cpu" (
  "time" TIMESTAMP WITH TIME ZONE NOT NULL,
  "total" NUMERIC(8, 2),
  "user" NUMERIC(8, 2),
  "nice" NUMERIC(8, 2),
  "idle" NUMERIC(8, 2),
  "iowait" NUMERIC(8, 2),
  "irq" NUMERIC(8, 2),
  "steal" NUMERIC(8, 2),
  "guest" NUMERIC(8, 2),
  "ctx_switches" INTEGER,
  "interrupts" INTEGER,
  "soft_interrupts" INTEGER,
  "syscalls" INTEGER,
  "cpucore" INTEGER,
  "time_since_update" DOUBLE PRECISION NOT NULL,
  "ctx_switches_guage" INTEGER,
  "interrupts_guage" INTEGER,
  "soft_interrupts_guage" INTEGER,
  "syscalls_guage" INTEGER
);

This creates a regular postgres table

For the low integer value types, SMALLINT might be more appropiate, but the docs state 'only if space is at a premium', as it's slightly slower (e.g. internally everything still works with 32bit ints most probably).

Some ints might need 'BIGINT' but sure on the the range from your example ;)

I think for timescale to work effectively, you'd have to use actual timestamps instead of 'time_since_update', but I'm not a timescale expert, which is why I added time as well. Also, 'inteveral' might be a better type, but not familiar enough with this type.

Then, if the timescale extension is available,

SELECT extversion FROM pg_extension WHERE extname = 'timescaledb';

we can convert it (and enable compression to add a significant performance and storage benefit.

SELECT create_hypertable('cpu', by_range('time'));
CREATE INDEX idx_cpu_core_time ON "cpu" ("cpucore", "time" DESC);
ALTER TABLE "cpu" SET (
    timescaledb.compress,
    timescaledb.compress_segmentby='cpucore',
    timescaledb.compress_orderby='time DESC'
);
SELECT add_compression_policy('cpu', INTERVAL '8 days');

If alter fails, the wrong license/container image was chosen, so this could just 'warn' that we are continuing without compression.

The segment/index needs to be figured out what fits best here (of if anything at all). so cpu_core needs to be a column that it makes sense having an index on. If there is none, it might not be worthwhile to compress/have an index. So for now I assumed the stats are unique per CPU, but I know this is also not true ... (load is system-wide)

From what I understood from this example here: https://docs.timescale.com/use-timescale/latest/compression/about-compression/#segment-by-columns it could be that in some cases, you want to keep different data in similar tables.

For diskio, we'd end up wtih

CREATE TABLE "diskio" (
  "time" TIMESTAMP WITH TIME ZONE NOT NULL,
  "read_count" BIGINT,
  "write_count" BIGINT,
  "read_bytes" BIGINT,
  "write_bytes" BIGINT,
  "disk_name" TEXT,
  "time_since_update" DOUBLE PRECISION NOT NULL,
  "read_count_guage" INTEGER,
  "write_count_guage" INTEGER,
  "read_bytes_guage" INTEGER,
  "write_bytes_guage" INTEGER
);
SELECT extversion FROM pg_extension WHERE extname = 'timescaledb';
SELECT create_hypertable('diskio', by_range('time'));
CREATE INDEX idx_disk_name_time ON "diskio" ("disk_name", "time" DESC);
ALTER TABLE "diskio" SET (
    timescaledb.compress,
    timescaledb.compress_segmentby='disk_name',
    timescaledb.compress_orderby='time DESC'
);
SELECT add_compression_policy('diskio', INTERVAL '8 days');

One other timescale specific feature, would then be to create automated 'aggregated views' where postgres/timescaledb would automatically take averages etc over longer periods of time, and efficiently stores them for quick access. E.g. the zoomed out view. https://blog.timescale.com/blog/achieving-the-best-of-both-worlds-ensuring-up-to-date-results-with-real-time-aggregation/

RazCrimson · 2024-06-18T17:51:26Z

If we do plan on creating the tables from Glances end and go with one table per plugin, using something like sqlalchemy to abstract out the DB connector implementation would probably be better.

Ref: https://www.sqlalchemy.org/

nicolargo · 2024-06-22T08:49:06Z

@oliv3r thanks for the implementation proposal.

For the moment, export plugins do not have any feature to create table with the needed information (variables are not typed in a standard way). For example, CPU plugin fields description is the following: https://github.com/nicolargo/glances/blob/v4.0.8/glances/plugins/cpu/__init__.py#L24

...
    'system': {
        'description': 'Percent time spent in kernel space. System CPU time is the \
time spent running code in the Operating System kernel.',
        'unit': 'percent',
    },
...

It will increase code complexity/maintenance only for PostgreSQL export.

Another approach will be init tables out of Glances but as a consequence we should maintain the script/documentation used to create tables with Glances data model.

github-actions · 2024-06-22T08:49:52Z

This issue is available for anyone to work on. Make sure to reference this issue in your pull request. ✨ Thank you for your contribution ! ✨

siddheshtv · 2024-06-29T08:41:51Z

Hi there, I am inclined towards contributing to this feature if its still available to work on.
I know about sql alchemy and postgres and thinking performance wise, multiple tables might be the way to go.

RazCrimson · 2024-06-29T10:18:16Z

@siddheshtv Feel free to take a stab at this.

Though glances still lacks typing at many layers so defining the tables for all the plugins might be a bit hard

siddheshtv · 2024-06-29T10:44:00Z

Alright, I'll give it a go.

github-actions · 2024-09-28T01:58:45Z

This issue is stale because it has been open for 3 months with no activity.

oliv3r · 2024-09-28T12:06:47Z

Don't be such a downer bot

nicolargo changed the title ~~(postgre)SQL export support~~ (postgre)SQL export support / TimeScaleDB Jun 8, 2024

nicolargo added the new feature label Jun 8, 2024

nicolargo added this to the Glances 4.1.0 milestone Jun 8, 2024

nicolargo added the needs contributor label Jun 8, 2024

nicolargo added the needs more info label Jun 9, 2024

nicolargo added needs discussion needs contributor and removed needs more info needs contributor labels Jun 22, 2024

nicolargo modified the milestones: Glances 4.1.0, Glances 4.2.0 Jun 29, 2024

github-actions bot added the inactive label Sep 28, 2024

github-actions bot removed the inactive label Sep 29, 2024

nicolargo modified the milestones: Glances 4.2.0, Next releases, Glances 4.3.0 Oct 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(postgre)SQL export support / TimeScaleDB #2814

(postgre)SQL export support / TimeScaleDB #2814

oliv3r commented Jun 3, 2024

nicolargo commented Jun 8, 2024

nicolargo commented Jun 9, 2024

oliv3r commented Jun 12, 2024 •

edited

Loading

RazCrimson commented Jun 18, 2024

nicolargo commented Jun 22, 2024

github-actions bot commented Jun 22, 2024

siddheshtv commented Jun 29, 2024

RazCrimson commented Jun 29, 2024

siddheshtv commented Jun 29, 2024

github-actions bot commented Sep 28, 2024

oliv3r commented Sep 28, 2024

(postgre)SQL export support / TimeScaleDB #2814

(postgre)SQL export support / TimeScaleDB #2814

Comments

oliv3r commented Jun 3, 2024

nicolargo commented Jun 8, 2024

nicolargo commented Jun 9, 2024

oliv3r commented Jun 12, 2024 • edited Loading

RazCrimson commented Jun 18, 2024

nicolargo commented Jun 22, 2024

github-actions bot commented Jun 22, 2024

siddheshtv commented Jun 29, 2024

RazCrimson commented Jun 29, 2024

siddheshtv commented Jun 29, 2024

github-actions bot commented Sep 28, 2024

oliv3r commented Sep 28, 2024

oliv3r commented Jun 12, 2024 •

edited

Loading