Re: index prefetching

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Georgios <gkokolatos(at)protonmail(dot)com>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Konstantin Knizhnik <knizhnik(at)garret(dot)ru>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Subject: Re: index prefetching
Date: 2024-02-15 04:29:27
Message-ID: CA+TgmoaDfvFF_krboNjjMYzOiPsGM_ioYqfngL-pNaj-nSs1DA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 14, 2024 at 7:43 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
> I don't think it's just a bookkeeping problem. In a way, nbtree already
> does keep an array of tuples to kill (see btgettuple), but it's always
> for the current index page. So it's not that we immediately go and kill
> the prior tuple - nbtree already stashes it in an array, and kills all
> those tuples when moving to the next index page.
>
> The way I understand the problem is that with prefetching we're bound to
> determine the kill_prior_tuple flag with a delay, in which case we might
> have already moved to the next index page ...

Well... I'm not clear on all of the details of how this works, but
this sounds broken to me, for the reasons that Peter G. mentions in
his comments about desynchronization. If we currently have a rule that
you hold a pin on the index page while processing the heap tuples it
references, you can't just throw that out the window and expect things
to keep working. Saying that kill_prior_tuple doesn't work when you
throw that rule out the window is probably understating the extent of
the problem very considerably.

I would have thought that the way this prefetching would work is that
we would bring pages into shared_buffers sooner than we currently do,
but not actually pin them until we're ready to use them, so that it's
possible they might be evicted again before we get around to them, if
we prefetch too far and the system is too busy. Alternately, it also
seems OK to read those later pages and pin them right away, as long as
(1) we don't also give up pins that we would have held in the absence
of prefetching and (2) we have some mechanism for limiting the number
of extra pins that we're holding to a reasonable number given the size
of shared_buffers.

However, it doesn't seem OK at all to give up pins that the current
code holds sooner than the current code would do.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-02-15 04:30:47 Re: Properly pathify the union planner
Previous Message Michael Paquier 2024-02-15 04:28:56 Re: 039_end_of_wal: error in "xl_tot_len zero" test