Quick Links

Re: Testing autovacuum wraparound (including failsafe)

From:	Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>
To:	Andres Freund <andres(at)anarazel(dot)de>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject:	Re: Testing autovacuum wraparound (including failsafe)
Date:	2021-06-10 13:42:01
Message-ID:	CAP4vRV5gEHFLB7NwOE6_dyHAeVfkvqF8Z_g5GaCQZNgBAE0Frw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Jun 10, 2021 at 10:52 AM Andres Freund <andres(at)anarazel(dot)de> wrote:

>
> I started to write a test for $Subject, which I think we sorely need.
>
> Currently my approach is to:
> - start a cluster, create a few tables with test data
> - acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
> autovacuum from doing anything
> - cause dead tuples to exist
> - restart
> - run pg_resetwal -x 2000027648
> - do things like acquiring pins on pages that block vacuum from progressing
> - commit prepared transaction
> - wait for template0, template1 datfrozenxid to increase
> - wait for relfrozenxid for most relations in postgres to increase
> - release buffer pin
> - wait for postgres datfrozenxid to increase
>
>
Cool. Thank you for working on that!
Could you please share a WIP patch for the $subj? I'd be happy to help with
it.

So far so good. But I've encountered a few things that stand in the way of
> enabling such a test by default:
>
> 1) During startup StartupSUBTRANS() zeroes out all pages between
> oldestActiveXID and nextXid. That takes 8s on my workstation, but only
> because I have plenty memory - pg_subtrans ends up 14GB as I currently
> do
> the test. Clearly not something we could do on the BF.
> ....
>
3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
> first xid on a clog page. It's not hard to determine which xids are but
> it
> depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded
> a
> value appropriate for 8KB, but ...
>
> Maybe we can add new pg_resetwal option? Something like pg_resetwal
--xid-near-wraparound, which will ask pg_resetwal to calculate exact xid
value using values from pg_control and clog macros?
I think it might come in handy for manual testing too.

> I have 2 1/2 ideas about addressing 1);
>
> - We could exposing functionality to do advance nextXid to a future value
> at
> runtime, without filling in clog/subtrans pages. Would probably have to
> live
> in varsup.c and be exposed via regress.so or such?
>
> This option looks scary to me. Several functions rely on the fact that
StartupSUBTRANS() have zeroed pages.
And if we will do it conditional just for tests, it means that we won't
test the real code path.

- The only reason StartupSUBTRANS() does that work is because of the
> prepared
> transaction holding back oldestActiveXID. That transaction in turn
> exists to
> prevent autovacuum from doing anything before we do test setup
> steps.
>

>
> Perhaps it'd be sufficient to set autovacuum_naptime really high
> initially,
> perform the test setup, set naptime to something lower, reload config.
> But
> I'm worried that might not be reliable: If something ends up allocating
> an
> xid we'd potentially reach the path in GetNewTransaction() that wakes up
> the
> launcher? But probably there wouldn't be anything doing so?
>
>
Another aspect that might not make this a good choice is that it actually
> seems relevant to be able to test cases where there are very old still
> running transactions...
>
> Maybe this exact scenario can be covered with a separate long-running
test, not included in buildfarm test suite?

--
Best regards,
Lubennikova Anastasia

In response to

Testing autovacuum wraparound (including failsafe) at 2021-04-23 20:43:06 from Andres Freund

Responses

Re: Testing autovacuum wraparound (including failsafe) at 2021-06-11 01:18:50 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Amit Kapila	2021-06-10 13:45:30	Re: Decoding speculative insert with toast leaks memory
Previous Message	Andrew Dunstan	2021-06-10 13:08:06	Re: BF assertion failure on mandrill in walsender, v13