Re: Streaming I/O, vectored I/O (WIP)

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Streaming I/O, vectored I/O (WIP)
Date: 2024-03-12 10:39:01
Message-ID: CAFiTN-vkrD77SH7JSS_P-07f7H-5uxB4XM_=OVHyumzqJKLqpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 12, 2024 at 12:10 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
>
> I think you'd be right if StartReadBuffers() were capable of
> processing a sequence consisting of a hit followed by misses, but
> currently it always gives up after the first hit. That is, it always
> processes some number of misses (0-16) and then at most one hit. So
> for now the variable would always turn out to be the same as blockNum.
>
Okay, then shouldn't this "if (found)" block immediately break the
loop so that when we hit the block we just return that block? So it
makes sense what you explained but with the current code if there are
the first few hits followed by misses then we will issue the
smgrprefetch() for the initial hit blocks as well.

+ if (found)
+ {
+ /*
+ * Terminate the read as soon as we get a hit. It could be a
+ * single buffer hit, or it could be a hit that follows a readable
+ * range. We don't want to create more than one readable range,
+ * so we stop here.
+ */
+ actual_nblocks = operation->nblocks = *nblocks = i + 1; (Dilip: I
think we should break after this?)
+ }
+ else
+ {
+ /* Extend the readable range to cover this block. */
+ operation->io_buffers_len++;
+ }
+ }

> The reason is that I wanted to allows "full sized" read system calls
> to form. If you said "hey please read these 16 blocks" (I'm calling
> that "full sized", AKA MAX_BUFFERS_PER_TRANSFER), and it found 2 hits,
> then it could only form a read of 14 blocks, but there might be more
> blocks that could be read after those. We would have some arbitrary
> shorter read system calls, when we wanted to make them all as big as
> possible. So in the current patch you say "hey please read these 16
> blocks" and it returns saying "only read 1", you call again with 15
> and it says "only read 1", and you call again and says "read 16!"
> (assuming 2 more were readable after the original range we started
> with). Then physical reads are maximised. Maybe there is some nice
> way to solve that, but I thought this way was the simplest (and if
> there is some instruction-cache-locality/tight-loop/perf reason why we
> should work harder to find ranges of hits, it could be for later).
> Does that make sense?

Understood, I think this makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-12 10:45:02 Re: speed up a logical replica setup
Previous Message John Naylor 2024-03-12 10:34:30 Re: [PoC] Improve dead tuple storage for lazy vacuum