Re: Asynchronous and "direct" IO support for PostgreSQL.

From: Alexey Lesovsky <alexey(dot)lesovsky(at)dataegret(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.
Date: 2021-02-24 16:15:14
Message-ID: c8a067ac-55df-42e8-57b0-d70cdd30e0bc@dataegret.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Thank you for the amazing and great work.

On 23.02.2021 15:03, Andres Freund wrote:
> ## Stats
>
> There are two new views: pg_stat_aios showing AIOs that are currently
> in-progress, pg_stat_aio_backends showing per-backend statistics about AIO.

As a DBA I would like to propose a few amendments that might help with
practical usage of stats when feature will be finally implemented. My
suggestions aren’t related to the central idea of the proposed changes,
but rather to the stats part.

A quick side note, there are two terms in Prometheus
(https://prometheus.io/docs/concepts/metric_types/):
1. Counter. A counter is a cumulative metric that represents a single
monotonically increasing counter whose value can only increase or be
reset to zero on restart.
2. Gauge. A gauge is a metric that represents a single numerical value
that can arbitrarily go up and down.

For the purposes of long-term stats collection, COUNTERs are preferred
over GAUGEs, because COUNTERs allow us to understand how metrics are
changed overtime without missing out potential spikes in activity. As a
result, we have a much better historic perspective.

Measuring and collecting GAUGEs is limited to the moments in time when
the stats are taken (snapshots) so the changes that took place between
the snapshots remain unmeasured. In systems with a high rate of
transactions per second (even 1 second interval between the snapshots)
GAUGEs measuring won’t provide the full picture.  In addition, most of
the monitoring systems like Prometheus, Zabbix, etc. use longer
intervals (from 10-15 to 60 seconds).

The main idea is to try to expose almost all numeric stats as COUNTERs -
this increases overall observabilty of implemented feature.

pg_stat_aios.
In general, this stat is a set of text values, and at the same time it
looks GAUGE-like (similar to pg_stat_activity or pg_locks), and is only
relevant for the moment when the user is looking at it. I think it would
be better to rename this view to pg_stat_progress_aios. And keep
pg_stat_aios for other AIO stats with global COUNTERs (like stuff in
pg_stat_user_tables or pg_stat_statements, or system-wide /proc/stat,
/proc/diskstats).

pg_stat_aio_backends.
This stat is based on COUNTERs, which is great, but the issue here is
that its lifespan is limited by the lifespan of the backend processes -
once the backend exits the stat will no longer be available - which
could be inappropriate in workloads with short-lived backends.

I think there might be few existing examples in the current code that
could be repurposed to implement the suggestions above (such as
pg_stat_user_tables, pg_stat_database, etc). With this in mind, I think
having these changes incorporated shouldn’t take significant effort
considering the benefit it will bring to the final user.

Once again huge respect to your work on this changes and good look.

Regards, Alexey

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message John Naylor 2021-02-24 16:25:49 Re: [POC] verifying UTF-8 using SIMD instructions
Previous Message Joel Jacobson 2021-02-24 16:03:39 Re: Bizarre behavior of \w in a regular expression bracket construct