Re: decoupling table and index vacuum

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: decoupling table and index vacuum
Date: 2021-09-24 05:41:35
Message-ID: CAD21AoAG3zHmPcKtwguE=1Qyq0bM1hdoOQmSC7hQ0FvExLNY_w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Sep 16, 2021 at 7:09 AM Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Apr 21, 2021 at 8:21 AM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > Now, the reason for this is that when we discover dead TIDs, we only
> > record them in memory, not on disk. So, as soon as VACUUM ends, we
> > lose all knowledge of whether those TIDs were and must rediscover
> > them. Suppose we didn't do this, and instead had a "dead TID" fork for
> > each table. Suppose further that this worked like a conveyor belt,
> > similar to WAL, where every dead TID we store into the fork is
> > assigned an identifying 64-bit number that is never reused.
>
> Enabling index-only scans is a good enough reason to pursue this
> project, even on its own.

+1

> I attached a POC autovacuum logging instrumentation patch that shows
> how VACUUM uses *and* sets VM bits.

Logging how vacuum uses and sets VM bits seems a good idea.

I've read the proposed PoC patch. Probably it's better to start a new
thread for this patch and write the comment for it there but let me
leave one comment on the patch:

With the patch, we increment allfrozen_pages counter, which is used to
determine whether or not we advance relfrozenxid and relminmxid, at
two places:

@@ -1141,7 +1201,9 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
* in this case an approximate answer is OK.
*/
if (aggressive ||
VM_ALL_FROZEN(vacrel->rel, blkno, &vmbuffer))
- vacrel->frozenskipped_pages++;
+ vacrel->allfrozen_pages++;
+ else
+ vacrel->allvisible_pages++;
continue;

@@ -1338,6 +1400,8 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams
*params, bool aggressive)
*/
if (!PageIsAllVisible(page))
{
+ vacrel->allfrozen_pages++;
+

I think that we will end up doubly counting the page as scanned_pages
and allfrozen_pages due to the newly added latter change. This seems
wrong to me because we calculate as follows:

@@ -644,7 +656,7 @@ heap_vacuum_rel(Relation rel, VacuumParams *params,
* NB: We need to check this before truncating the relation,
because that
* will change ->rel_pages.
*/
- if ((vacrel->scanned_pages + vacrel->frozenskipped_pages)
+ if ((vacrel->scanned_pages + vacrel->allfrozen_pages)
< vacrel->rel_pages)
{
Assert(!aggressive);

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2021-09-24 05:51:00 Re: Added schema level support for publication.
Previous Message Dilip Kumar 2021-09-24 05:36:12 Re: row filtering for logical replication