Re: decoupling table and index vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-22 18:47:14
Message-ID: CA+Tgmob6CnSyyB+tU52727nMYnmecq00g-c0jV-y5GhqVh83NQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 22, 2021 at 10:28 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> The dead TID fork needs to also be efficiently searched. If the heap
> scan runs twice, the collected dead TIDs on each heap pass could be
> overlapped. But we would not be able to merge them if we did index
> vacuuming on one of indexes at between those two heap scans. The
> second time heap scan would need to record only TIDs that are not
> collected by the first time heap scan.

I agree that there's a problem here. It seems to me that it's probably
possible to have a dead TID fork that implements "throw away the
oldest stuff" efficiently, and it's probably also possible to have a
TID fork that can be searched efficiently. However, I am not sure that
it's possible to have a dead TID fork that does both of those things
efficiently. Maybe you have an idea. My intuition is that if we have
to pick one, it's MUCH more important to be able to throw away the
oldest stuff efficiently. I think we can work around the lack of
efficient lookup, but I don't see a way to work around the lack of an
efficient operation to discard the oldest stuff.

> Right. Given decoupling index vacuuming, I think the index’s garbage
> statistics are important which preferably need to be fetchable without
> accessing indexes. It would be not hard to estimate how many index
> tuples might be able to be deleted by looking at the dead TID fork but
> it doesn’t necessarily match the actual number.

Right, and to appeal (I think) to Peter's quantitative vs. qualitative
principle, it could be way off. Like, we could have a billion dead
TIDs and in one index the number of index entries that need to be
cleaned out could be 1 billion and in another index it could be zero
(0). We know how much data we will need to scan because we can fstat()
the index, but there seems to be no easy way to estimate how many of
those pages we'll need to dirty, because we don't know how successful
previous opportunistic cleanup has been. It is not impossible, as
Peter has pointed out a few times now, that it has worked perfectly
and there will be no modifications required, but it is also possible
that it has done nothing.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-22 18:56:10 Re: decoupling table and index vacuum
Previous Message Andres Freund 2021-04-22 18:44:00 Re: decoupling table and index vacuum