Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-11 19:30:07
Message-ID: CAAKRu_Y5FrihOgMeP7DsSYORP65+96-da2g18Xsib+r32ScddQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached v7 is rebased over the commit you just made to remove
LVPagePruneState->hastup.

On Thu, Jan 11, 2024 at 11:54 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Wed, Jan 10, 2024 at 5:28 PM Melanie Plageman
> <melanieplageman(at)gmail(dot)com> wrote:
> > Yes, the options I can think of are:
> >
> > 1) rename the parameter to "immed_reap" or similar and make very clear
> > in heap_page_prune_opt() that you are not to pass true.
> > 2) make immediate reaping work for on-access pruning. I would need a
> > low cost way to find out if there are any indexes on the table. Do you
> > think this is possible? Should we just rename the parameter for now
> > and think about that later?
>
> I don't think we can implement (2), because:
>
> robert.haas=# create table test (a int);
> CREATE TABLE
> robert.haas=# begin;
> BEGIN
> robert.haas=*# select * from test;
> a
> ---
> (0 rows)
>
> <in another window>
>
> robert.haas=# create index on test (a);
> CREATE INDEX
>
> In English, the lock we hold during regular table access isn't strong
> enough to foreclose concurrent addition of an index.

Ah, I see.

> So renaming the parameter seems like the way to go. How about "mark_unused_now"?

I've done this in attached v7.

> > > - if (!ItemIdIsUsed(itemid) || ItemIdIsDead(itemid))
> > > + if (!ItemIdIsUsed(itemid))
> > > continue;
> > >
> > > There is a bit of overhead here for the !no_indexes case. I assume it
> > > doesn't matter.
> >
> > Right. Should be okay. Alternatively, and I'm not saying this is a
> > good idea, but we could throw this into the loop in heap_page_prune()
> > which calls heap_prune_chain():
> >
> > + if (ItemIdIsDead(itemid) && prstate.no_indexes)
> > + {
> > + heap_prune_record_unused(&prstate, offnum);
> > + continue;
> > + }
> >
> > I think that is correct?
>
> Wouldn't that be wrong in any case where heap_prune_chain() calls
> heap_prune_record_dead()?

Hmm. But previously, we skipped heap_prune_chain() for already dead
line pointers. In my patch, I rely on the test in the loop in
heap_prune_chain() to set those LP_UNUSED if mark_unused_now is true
(previously the below code just broke out of the loop).

/*
* Likewise, a dead line pointer can't be part of the chain. (We
* already eliminated the case of dead root tuple outside this
* function.)
*/
if (ItemIdIsDead(lp))
{
/*
* If the caller set mark_unused_now true, we can set dead line
* pointers LP_UNUSED now. We don't increment ndeleted here since
* the LP was already marked dead.
*/
if (unlikely(prstate->mark_unused_now))
heap_prune_record_unused(prstate, offnum);

break;
}

so wouldn't what I suggested simply set the item LP_UNSED that before
invoking heap_prune_chain()? Thereby achieving the same result without
invoking heap_prune_chain() for already dead line pointers? I could be
missing something. That heap_prune_chain() logic sometimes gets me
turned around.

> > Yes, so I preferred it in the body of heap_prune_chain() (the caller).
> > Andres didn't like the extra level of indentation. I could wrap
> > heap_record_dead() and heap_record_unused(), but I couldn't really
> > think of a good wrapper name. heap_record_dead_or_unused() seems long
> > and literal. But, it might be the best I can do. I can't think of a
> > general word which encompasses both the transition to death and to
> > disposal.
>
> I'm sure we could solve the wordsmithing problem but I think it's
> clearer if the heap_prune_record_* functions don't get clever.

I've gone with my heap_record_dead_or_unused() suggestion.

- Melanie

Attachment Content-Type Size
v7-0002-Move-VM-update-code-to-lazy_scan_prune.patch text/x-patch 10.8 KB
v7-0003-Inline-LVPagePruneState-members-in-lazy_scan_prun.patch text/x-patch 11.8 KB
v7-0001-Set-would-be-dead-items-LP_UNUSED-while-pruning.patch text/x-patch 17.0 KB
v7-0004-Move-freespace-map-update-into-lazy_scan_-no-prun.patch text/x-patch 9.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Lakhin 2024-01-11 20:00:01 Re: Test slots invalidations in 035_standby_logical_decoding.pl only if dead rows are removed
Previous Message Robert Haas 2024-01-11 19:12:20 buildfarm failures in pg_walsummary checks