Re: Testing autovacuum wraparound (including failsafe)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Testing autovacuum wraparound (including failsafe)
Date: 2022-06-30 01:40:12
Message-ID: CAD21AoDVhkXp8HjpFO-gp3TgL6tCKcZQNxn04m01VAtcSi-5sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Feb 1, 2022 at 11:58 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Jun 11, 2021 at 10:19 AM Andres Freund <andres(at)anarazel(dot)de> wrote:
> >
> > Hi,
> >
> > On 2021-06-10 16:42:01 +0300, Anastasia Lubennikova wrote:
> > > Cool. Thank you for working on that!
> > > Could you please share a WIP patch for the $subj? I'd be happy to help with
> > > it.
> >
> > I've attached the current WIP state, which hasn't evolved much since
> > this message... I put the test in src/backend/access/heap/t/001_emergency_vacuum.pl
> > but I'm not sure that's the best place. But I didn't think
> > src/test/recovery is great either.
> >
>
> Thank you for sharing the WIP patch.
>
> Regarding point (1) you mentioned (StartupSUBTRANS() takes a long time
> for zeroing out all pages), how about using single-user mode instead
> of preparing the transaction? That is, after pg_resetwal we check the
> ages of datfrozenxid by executing a query in single-user mode. That
> way, we don’t need to worry about autovacuum concurrently running
> while checking the ages of frozenxids. I’ve attached a PoC patch that
> does the scenario like:
>
> 1. start cluster with autovacuum=off and create tables with a few data
> and make garbage on them
> 2. stop cluster and do pg_resetwal
> 3. start cluster in single-user mode
> 4. check age(datfrozenxid)
> 5. stop cluster
> 6. start cluster and wait for autovacuums to increase template0,
> template1, and postgres datfrozenxids

The above steps are wrong.

I think we can expose a function in an extension used only by this
test in order to set nextXid to a future value with zeroing out
clog/subtrans pages. We don't need to fill all clog/subtrans pages
between oldestActiveXID and nextXid. I've attached a PoC patch for
adding this regression test and am going to register it to the next
CF.

BTW, while testing the emergency situation, I found there is a race
condition where anti-wraparound vacuum isn't invoked with the settings
autovacuum = off, autovacuum_max_workers = 1. AN autovacuum worker
sends a signal to the postmaster after advancing datfrozenxid in
SetTransactionIdLimit(). But with the settings, if the autovacuum
launcher attempts to launch a worker before the autovacuum worker who
has signaled to the postmaster finishes, the launcher exits without
launching a worker due to no free workers. The new launcher won’t be
launched until new XID is generated (and only when new XID % 65536 ==
0). Although autovacuum_max_workers = 1 is not mandatory for this
test, it's easier to verify the order of operations.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

Attachment Content-Type Size
v1-0001-Add-regression-tests-for-emergency-vacuums.patch application/octet-stream 10.0 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2022-06-30 01:48:12 Add a test for "cannot truncate foreign table"
Previous Message Robert Haas 2022-06-30 01:39:56 Re: Emit extra debug message when executing extension script.