Re: should vacuum's first heap pass be read-only?

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-05 22:25:50
Message-ID: CAH2-WzmQN4bcs8XmFrd3d9Gsk24AKdL5EQa4Pg0hYEko40--LQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 2:53 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Tue, Apr 5, 2022 at 4:30 PM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> > On Tue, Apr 5, 2022 at 1:10 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > > I had assumed that this would not be the case, because if the page is
> > > being accessed by the workload, it can be pruned - and probably frozen
> > > too, if we wanted to write code for that and spend the cycles on it -
> > > and if it isn't, pruning and freezing probably aren't needed.
> >
> > [ a lot of things ]
>
> I don't understand what any of this has to do with the point I was raising here.

Why do you assume that we'll ever have an accurate idea of how many
LP_DEAD items there are, before we've looked? And if we're wrong about
that, persistently, why should anything else we think about it really
matter? This is an inherently dynamic and cyclic process. Statistics
don't really work here. That was how my remarks were related to yours.
That should be in scope -- getting better information about what work
we need to do by blurring the boundaries between deciding what to do,
and executing that plan.

On a long enough timeline the LP_DEAD items in heap pages are bound to
become the dominant concern in almost any interesting case for the
conveyor belt, for the obvious reason: you can't do anything about
LP_DEAD items without also doing every other piece of processing
involving those same heap pages. So in that sense, yes, they will be
the dominant problem at times, for sure.

On the other hand it seems very hard to imagine an interesting
scenario in which LP_DEAD items are the dominant problem from the
earliest stage of processing by VACUUM. But even if it was somehow
possible, would it matter? That would mean that there'd be occasional
instances of the conveyor belt being ineffective -- hardly the end of
the world. What has it cost us to keep it as an option that wasn't
used? I don't think we'd have to do any extra work, other than
in-memory bookkeeping.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Zhihong Yu 2022-04-05 22:35:27 Re: MERGE bug report
Previous Message Joe Wildish 2022-04-05 22:17:30 MERGE bug report