Re: Emit fewer vacuum records by reaping removable tuples during pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Melanie Plageman <melanieplageman(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, Peter Geoghegan <pg(at)bowt(dot)ie>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emit fewer vacuum records by reaping removable tuples during pruning
Date: 2024-01-08 20:10:33
Message-ID: CA+TgmoaDAMafeUA38cKrDT4A29agP1tt2O9bSbJ+n9hQdj_Hbg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 5, 2024 at 3:57 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> > I will be astonished if you can make this work well enough to avoid
> > huge regressions in plausible cases. There are plenty of cases where
> > we do a very thorough job opportunistically removing index tuples.
>
> These days the AM is often involved with that, via
> table_index_delete_tuples()/heap_index_delete_tuples(). That IIRC has to
> happen before physically removing the already-marked-killed index entries. We
> can't rely on being able to actually prune the heap page at that point, there
> might be other backends pinning it, but often we will be able to. If we were
> to prune below heap_index_delete_tuples(), we wouldn't need to recheck that
> index again during "individual tuple pruning", if the to-be-marked-unused heap
> tuple is one of the tuples passed to heap_index_delete_tuples(). Which
> presumably will be very commonly the case.
>
> At least for nbtree, we are much more aggressive about marking index entries
> as killed, than about actually removing the index entries. "individual tuple
> pruning" would have to look for killed-but-still-present index entries, not
> just for "live" entries.

I don't want to derail this thread, but I don't really see what you
have in mind here. The first paragraph sounds like you're imagining
that while pruning the index entries we might jump over to the heap
and clean things up there, too, but that seems like it wouldn't work
if the table has more than one index. I thought you were talking about
starting with a heap tuple and bouncing around to every index to see
if we can find index pointers to kill in every one of them. That
*could* work out, but you only need one index to have been
opportunistically cleaned up in order for it to fail to work out.
There might well be some workloads where that's often the case, but
the regressions in the workloads where it isn't the case seem like
they would be rather substantial, because doing an extra lookup in
every index for each heap tuple visited sounds pricey.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-01-08 20:11:30 Re: verify predefined LWLocks have entries in wait_event_names.txt
Previous Message Rafsun Masud 2024-01-08 20:06:22 Re: printing raw parse tree