From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
---|---|
To: | Tomas Vondra <tomas(at)vondra(dot)me> |
Cc: | Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
Subject: | Re: index prefetching |
Date: | 2025-08-12 23:33:57 |
Message-ID: | CAH2-Wzko86NwiENCJGtakJ=fOhWpr-Yz-F+1oxgv2Ku1mvXwvA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, Aug 12, 2025 at 7:10 PM Tomas Vondra <tomas(at)vondra(dot)me> wrote:
> Actually, this might be a consequence of how backwards scans work (at
> least in btree). I logged the block in index_scan_stream_read_next, and
> this is what I see in the forward scan (at the beginning):
Just to be clear: you did disable deduplication and then reindex,
right? You're accounting for the known issue with posting list TIDs
returning TIDs in the wrong order, relative to the scan direction
(when the scan direction is backwards)?
It won't be necessary to do this once I commit my patch that fixes the
issue directly, on the nbtree side, but for now deduplication messes
things up here. And so for now you have to work around it.
> But with the backwards scan we apparently scan the values backwards, but
> then the blocks for each value are accessed in forward direction. So we
> do a couple blocks "forward" and then jump to the preceding value - but
> that's a couple blocks *back*. And that breaks the lastBlock check.
I don't think that this should be happening. The read stream ought to
be seeing blocks in exactly the same order as everything else.
> I believe this applies both to master and the prefetching, except that
> master doesn't have read stream - so it only does sync I/O.
In what sense is it an issue on master?
On master, we simply access the TIDs in whatever order amgettuple
returns TIDs in. That should always be scan order/index key space
order, where heap TID counts as a tie-breaker/affects the key space in
the presence of duplicates (at least once that issue with posting
lists is fixed, or once deduplication has been disabled in a way that
leaves no posting list TIDs around via a reindex).
It is certainly not surprising that master does poorly on backwards
scans. And it isn't all that surprising that master does worse on
backwards scans when direct I/O is in use (per the explanation
Andres offered just now). But master should nevertheless always read
the TIDs in whatever order it gets them from amgettuple in.
It sounds like amgetbatch doesn't really behave analogously to master
here, at least with backwards scans. It sounds like you're saying that
we *won't* feed TIDs heap block numbers to the read stream in exactly
scan order (when we happen to be scanning backwards) -- which seems
wrong to me.
As you pointed out, a forwards scan of a DESC column index should feed
heap blocks to the read stream in a way that is very similar to an
equivalent backwards scan of a similar ASC column on the same table.
There might be some very minor differences, due to differences in the
precise leaf page boundaries among each of the indexes. But that
should hardly be noticeable at all.
> Could that hide the extra buffer accesses, somehow?
I think that you meant to ask about *missing* buffer hits with the
patch, for the forwards scan. That doesn't agree with the backwards
scan with the patch, nor does it agree with master (with either the
forwards or backwards scan). Note that the heap accesses themselves
appear to have sane/consistent numbers, since we always see
"read=49933" as expected for those, for all 4 query executions that I
showed.
The "missing buffer hits" issue seems like an issue with the
instrumentation itself. Possibly one that is totally unrelated to
everything else we're discussing.
--
Peter Geoghegan
From | Date | Subject | |
---|---|---|---|
Next Message | Michael Paquier | 2025-08-12 23:36:51 | Re: CI failures with Windows - VS2019 jobs |
Previous Message | Thomas Munro | 2025-08-12 23:29:23 | Re: `pg_ctl init` crashes when run concurrently; semget(2) suspected |