Re: Streaming I/O, vectored I/O (WIP)

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Melanie Plageman <melanieplageman(at)gmail(dot)com>
Subject: Re: Streaming I/O, vectored I/O (WIP)
Date: 2024-04-02 08:39:49
Message-ID: CA+hUKGJUFkkOkdVq0hug3h1NXVJ0CSA3XTk8XLP2J0y9353ZGw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I had been planning to commit v14 this morning but got cold feet with
the BMR-based interface. Heikki didn't like it much, and in the end,
neither did I. I have now removed it, and it seems much better. No
other significant changes, just parameter types and inlining details.
For example:

* read_stream_begin_relation() now takes a Relation, likes its name says
* StartReadBuffers()'s operation takes smgr and optional rel
* ReadBuffer_common() takes smgr and optional rel

ReadBuffer() (which calls ReadBuffer_common() which calls
StartReadBuffer() as before) now shows no regression in a tight loop
over ~1 million already-in-cache pages (something Heikki had observed
before and could only completely fix with a change that affected all
callers). The same test using read_stream.c is still slightly slower,
~1 million pages -in-cache pages 301ms -> 308ms, which seems
acceptable to me and could perhaps be chased down with more study of
inlining/specialisation. As mentioned before, it doesn't seem to be
measurable once you actually do something with the pages.

In some ways BMR was better than the "fake RelationData" concept
(another attempt at wrestling with the relation vs storage duality,
that is, the online vs recovery duality). But in other ways it was
worse: a weird inconsistent mixture of pass-by-pointer and
pass-by-value interfaces that required several code paths to handle it
being only partially initialised, which turned out to be wasted cycles
implicated in regressions, despite which it is not even very nice to
use anyway. I'm sure it could be made to work better, but I'm not yet
sure it's really needed. In later work for recovery I will need to
add a separate constructor read_stream_begin_smgr_something() anyway
for other reasons (multi-relation streaming, different callback) and
perhaps also a separate StartReadBuffersSmgr() if it saves measurable
cycles to strip out branches. Maybe it was all just premature
pessimisation.

So this is the version I'm going to commit shortly, barring objections.

Attachment Content-Type Size
v15-0001-Provide-vectored-variant-of-ReadBuffer.patch text/x-patch 41.3 KB
v15-0002-Provide-API-for-streaming-relation-data.patch text/x-patch 34.4 KB
v15-0003-Use-streaming-I-O-in-pg_prewarm.patch text/x-patch 2.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhijie Hou (Fujitsu) 2024-04-02 08:41:21 RE: Synchronizing slots from primary to standby
Previous Message Bertrand Drouvot 2024-04-02 08:33:40 Re: Introduce XID age and inactive timeout based replication slot invalidation