Re: decoupling table and index vacuum

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Geoghegan <pg(at)bowt(dot)ie>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-04-23 15:21:52
Message-ID: CA+Tgmoba6s1Tx6wM_L-EzuTp_7-5xLNrgPTGR9Jf1Emnb4YTBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 23, 2021 at 7:04 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> I think we can divide the TID fork into 16MB or 32MB chunks like WAL
> segment files so that we can easily remove old chunks. Regarding the
> efficient search part, I think we need to consider the case where the
> TID fork gets bigger than maintenance_work_mem. In that case, during
> the heap scan, we need to check if the discovered TID exists in a
> chunk of the TID fork that could be on the disk. Even if all
> known-dead-TIDs are loaded into an array on the memory, it could get
> much slower than the current heap scan to bsearch over the array for
> each dead TID discovered during heap scan. So it would be better to
> have a way to skip searching by already recorded TIDs. For example,
> during heap scan or HOT pruning, I think that when marking TIDs dead
> and recording it to the dead TID fork we can mark them “dead and
> recorded” instead of just “dead” so that future heap scans can skip
> those TIDs without existence check.

I'm not very excited about this. If we did this, and if we ever
generated dead-but-not-recorded TIDs, then we will potentially dirty
those blocks again later to mark them recorded.

Also, if bsearch() is a bottleneck, how about just using an O(1)
algorithm instead of an O(lg n) algorithm, rather than changing the
on-disk format?

Also, can you clarify exactly what you think the problem case is here?
It seems to me that:

1. If we're pruning the heap to collect dead TIDs, we should stop when
the number of TIDs we've accumulated reaches maintenance_work_mem. It
is possible that we could find when starting to prune that there are
*already* more dead TIDs than will fit, because maintenance_work_mem
might have been reduced since they were gathered. But that's OK: we
can figure out that there are more than will fit without loading them
all, and since we shouldn't do additional pruning in this case,
there's no issue.

2. If we're sanitizing indexes, we should normally discover that there
are few enough TIDs that we can still fit them all in memory. But if
that proves not to be the case, again because for example
maintenance_work_mem has been reduced, then we can handle that with
multiple index passes just as we do today.

3. If we're going back to the heap to permit TIDs to be recycled by
setting dead line pointers to unused, we can load in as many of those
as will fit in maintenance_work_mem, sort them by block number, and go
through block by block and DTRT. Then, we can release all that memory
and, if necessary, do the whole thing again. This isn't even
particularly inefficient.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2021-04-23 15:38:39 Re: A test for replay of regression tests
Previous Message Andres Freund 2021-04-23 15:20:31 Re: A test for replay of regression tests