| From: | Peter Geoghegan <pg(at)bowt(dot)ie> |
|---|---|
| To: | Andres Freund <andres(at)anarazel(dot)de> |
| Cc: | Tomas Vondra <tomas(at)vondra(dot)me>, Alexandre Felipe <o(dot)alexandre(dot)felipe(at)gmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Nazir Bilal Yavuz <byavuz81(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com> |
| Subject: | Re: index prefetching |
| Date: | 2026-02-24 18:13:25 |
| Message-ID: | CAH2-Wzmy7NMba9k8m_VZ-XNDZJEUQBU8TeLEeL960-rAKb-+tQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Feb 17, 2026 at 2:27 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> On 2026-02-17 12:16:23 -0500, Peter Geoghegan wrote:
> > > Note that there is pretty much *no* readhead, because the yields happen more
> > > frequently than a io_combine_limit sized IO can be formed.
> >
> > ISTM that we need the yields to better cooperate with whatever's
> > happening on the read stream side.
>
> Plausible. It could be that we could get away with controlling the rampup to
> be slower in potentially problematic cases, without needing the yielding, but
> not sure.
Attached is v11, which makes the read stream yielding mechanism better
cooperate with index prefetching, so as to avoid interefering with
io_combine_limit. This should deal with the odd performance that you
complained about. See
v11-0006-Introduce-read_stream_-pause-resume-yield.patch (and the
later prefetching patch
v11-0007-Add-heapam-index-scan-I-O-prefetching.patch) for details.
The whole idea of measuring "batch distance" is gone in this version,
though we do still only consider whether now is a good time to yield
at "batch boundaries". We always refuse yield on the first few batches
of the scan, so the idea of caring about batch boundaries is still
there, albeit in a much more limited form.
I think that the read stream aspects of this need expert review from
somebody like Melanie, Andres, or Thomas. The new approach to yielding
should definitely be considered a work in progress. Though testing
seems to show that it's a step or two in the right direction.
The other focus for v11 has been fixing regressions, particularly with
index-only scans that run on fully cached data. We've added a more
efficient approach to how we manage the memory for batches used for
index-only scans -- they require extra space to store cached
visibility info taken from the visibility map, which is now more compact.
It is allocated as part of the main batch allocation, avoiding needless
palloc churn.
This v11 also includes a new "Don't allocate _bt_search stack" patch,
which avoids allocating memory for an nbtree descent stack during
index scans. I found that this helped a bit with certain index scans
that call _bt_search frequently/descend the index many times. Anything
that avoids the use of palloc in a critical path is a good idea,
especially when doing so is as straightforward as it is with the
_bt_search stack (we simply don't need any stack in the hot _bt_first
code path).
--
Peter Geoghegan
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Bertrand Drouvot | 2026-02-24 18:16:10 | Re: Make use of unvolatize() in vac_truncate_clog() |
| Previous Message | Jacob Champion | 2026-02-24 18:13:02 | Re: pgsql: libpq: Grease the protocol by default |