Re: Asynchronous and "direct" IO support for PostgreSQL.

From: Andres Freund <andres(at)anarazel(dot)de>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Alexey Lesovsky <alexey(dot)lesovsky(at)dataegret(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.
Date: 2021-02-24 21:23:19
Message-ID: 20210224212319.iv267hqjdyg3fjpk@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2021-02-24 14:59:19 -0500, Greg Stark wrote:
> I guess what I would be looking for in stats would be a way to tell
> what the bandwidth, latency, and queue depth is. Ideally one day
> broken down by relation/index and pg_stat_statement record.

I think doing it at that granularity will likely be too expensive...

> I think seeing the actual in flight async requests in a connection is
> probably not going to be very useful in production.

I think it's good for analyzing concrete performance issues, but
probably not that much more. Although, it's not too hard to build
sampling based on top of it with a tiny bit of work (should display the
relfilenode etc).

> So number of async reads we've initiated, how many callbacks have been
> called, total cumulative elapsed time between i/o issued and i/o
> completed, total bytes of i/o initiated, total bytes of i/o completed.

Much of that is already in pg_stat_aio_backends - but is lost after
disconnect (easy to solve). We don't track bytes of IO currently, but
that'd not be hard.

However, it's surprisingly hard to do the measurement between "issued"
and "completed" in a meaningful way. It's obviously not hard to measure
the time at which the request was issued, but there's no real way to
determine the time at which it was completed. If a backend is busy doing
other things (e.g. invoke aggregate transition functions), we'll not see
the completion immediately, and therefore not have an accurate
timestamp.

With several methods of doing AIO we can set up signals that fire on
completion, but that's pretty darn expensive. And it's painful to write
such signal handlers in a safe way.

> I have some vague idea that we should have a generic infrastructure
> for stats that automatically counts things associated with plan nodes
> and automatically bubbles that data up to the per-transaction,
> per-backend, per-relation, and pg_stat_statements stats. But that's a
> whole other ball of wax :)

Heh, yea, let's tackle that separately ;)

Greetings,

Andres Freund

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-02-24 21:45:10 Re: Asynchronous and "direct" IO support for PostgreSQL.
Previous Message John Naylor 2021-02-24 21:19:47 Re: non-HOT update not looking at FSM for large tuple update