Re: Asynchronous and "direct" IO support for PostgreSQL.

From: Greg Stark <stark(at)mit(dot)edu>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Subject: Re: Asynchronous and "direct" IO support for PostgreSQL.
Date: 2021-02-23 19:58:32
Message-ID: CAM-w4HPpgd5TO2t1fXhNDg62EnYiY0aqW+Xa=MDkn0nFqOjrCA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 23 Feb 2021 at 05:04, Andres Freund <andres(at)anarazel(dot)de> wrote:
>
> ## Callbacks
>
> In the core AIO pieces there are two different types of callbacks at the
> moment:
>
> Shared callbacks, which can be invoked by any backend (normally the issuing
> backend / the AIO workers, but can be other backends if they are waiting for
> the IO to complete). For operations on shared resources (e.g. shared buffer
> reads/writes, or WAL writes) these shared callback needs to transition the
> state of the object the IO is being done for to completion. E.g. for a shared
> buffer read that means setting BM_VALID / unsetting BM_IO_IN_PROGRESS.
>
> The main reason these callbacks exist is that they make it safe for a backend
> to issue non-blocking IO on buffers (see the deadlock section above). As any
> blocked backend can cause the IO to complete, the deadlock danger is gone.

So firstly this is all just awesome work and I have questions but I
don't want them to come across in any way as criticism or as a demand
for more work. This is really great stuff, thank you so much!

The callbacks make me curious about two questions:

1) Is there a chance that a backend issues i/o, the i/o completes in
some other backend and by the time this backend gets around to looking
at the buffer it's already been overwritten again? Do we have to
initiate I/O again or have you found a way to arrange that this
backend has the buffer pinned from the time the i/o starts even though
it doesn't handle the comletion?

2) Have you made (or considered making) things like sequential scans
(or more likely bitmap index scans) asynchronous at a higher level.
That is, issue a bunch of asynchronous i/o and then handle the pages
and return the tuples as the pages arrive. Since sequential scans and
bitmap scans don't guarantee to read the pages in order they're
generally free to return tuples from any page in any order. I'm not
sure how much of a win that would actually be since all the same i/o
would be getting executed and the savings in shared buffers would be
small but if there are mostly hot pages you could imagine interleaving
a lot of in-memory pages with the few i/os instead of sitting idle
waiting for the async i/o to return.

> ## Stats
>
> There are two new views: pg_stat_aios showing AIOs that are currently
> in-progress, pg_stat_aio_backends showing per-backend statistics about AIO.

This is impressive. How easy is it to correlate with system aio stats?

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2021-02-23 20:47:33 Re: Faulty HEAP_XMAX_LOCK_ONLY & HEAP_KEYS_UPDATED hintbit combination
Previous Message Tom Lane 2021-02-23 18:36:16 Re: Some regular-expression performance hacking