Re: should vacuum's first heap pass be read-only?

From: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: should vacuum's first heap pass be read-only?
Date: 2022-04-05 10:19:09
Message-ID: CAFiTN-v+WDWay+cmwu2bPQyb-bvSbn7MiT8DmVohvcfVfCA3fQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 1, 2022 at 11:34 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>
> On Fri, Apr 1, 2022 at 12:08 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:
> > After thinking more about this I see there is some value of
> > remembering the dead tids in the conveyor belt. Basically, the point
> > is if there are multiple indexes and we do the index vacuum for some
> > of the indexes and skip for others. And now when we again do the
> > complete vacuum cycle that time we will again get all the old dead
> > tids + the new dead tids but without conveyor belt we might need to
> > perform the multiple cycle of the index vacuum even for the indexes
> > for which we had done the vacuum in previous vacuum cycle (if all tids
> > are not fitting in maintenance work mem). But with the conveyor belt
> > we remember the conveyor belt pageno upto which we have done the index
> > vacuum and then we only need to do vacuum for the remaining tids which
> > will definitely reduce the index vacuuming passes, right?
>
> I guess you're right, and it's actually a little bit better than that,
> because even if the data does fit into shared memory, we'll have to
> pass fewer TIDs to the worker to be removed from the heap, which might
> save a few CPU cycles. But I think both of those are very small
> benefits. If that's all we're going to do with the conveyor belt
> infrastructure, I don't think it's worth the effort.

I don't think that saving extra index passes is really a small gain.
I think this will save a lot of IO if indexes pages are not in shared
buffers because here we are talking about we can completely avoid the
index passes for some of the indexes if it is already done. And if
this is the only advantage then it might not be worth adding this
infrastructure but what about global indexes?

Because if we have global indexes then we must need this
infrastructure to store the dead items for the partition because for
example after vacuuming 1000 partitions while vacuuming the 1001st
partition if we need to vacuum the global index then we don't want to
rescan all the previous 1000 partitions to regenerate those old dead
items right? So I think this is the actual use case where we
indirectly skip the heap vacuuming for some of the partitions before
performing the index vacuum.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Antonin Houska 2022-04-05 10:22:16 Re: Logical replication row filtering and TOAST
Previous Message Amit Kapila 2022-04-05 10:06:10 Re: logical decoding and replication of sequences