Re: Testing autovacuum wraparound (including failsafe)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Testing autovacuum wraparound (including failsafe)
Date: 2023-03-14 06:01:30
Message-ID: CAD21AoAyYBZOiB1UPCPZJHTLk0-arrq5zqNGj+PrsbpdUy=g-g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 8, 2023 at 1:52 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Fri, Mar 3, 2023 at 8:34 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> > On 16/11/2022 06:38, Ian Lawrence Barwick wrote:
> > > Thanks for the patch. While reviewing the patch backlog, we have determined that
> > > the latest version of this patch was submitted before meson support was
> > > implemented, so it should have a "meson.build" file added for consideration for
> > > inclusion in PostgreSQL 16.
> >
> > I wanted to do some XID wraparound testing again, to test the 64-bit
> > SLRUs patches [1], and revived this.
>
> Thank you for reviving this thread!
>
> >
> > I took a different approach to consuming the XIDs. Instead of setting
> > nextXID directly, bypassing GetNewTransactionId(), this patch introduces
> > a helper function to call GetNewTransactionId() repeatedly. But because
> > that's slow, it does include a shortcut to skip over "uninteresting"
> > XIDs. Whenever nextXid is close to an SLRU page boundary or XID
> > wraparound, it calls GetNewTransactionId(), and otherwise it bumps up
> > nextXid close to the next "interesting" value. That's still a lot slower
> > than just setting nextXid, but exercises the code more realistically.
> >
> > I've written some variant of this helper function many times over the
> > years, for ad hoc testing. I'd love to have it permanently in the git tree.
>
> These functions seem to be better than mine.
>
> > In addition to Masahiko's test for emergency vacuum, this includes two
> > other tests. 002_limits.pl tests the "warn limit" and "stop limit" in
> > GetNewTransactionId(), and 003_wraparound.pl burns through 10 billion
> > transactions in total, exercising XID wraparound in general.
> > Unfortunately these tests are pretty slow; the tests run for about 4
> > minutes on my laptop in total, and use about 20 GB of disk space. So
> > perhaps these need to be put in a special test suite that's not run as
> > part of "check-world". Or perhaps leave out the 003_wraparounds.pl test,
> > that's the slowest of the tests. But I'd love to have these in the git
> > tree in some form.
>
> cbfot reports some failures. The main reason seems that meson.build in
> xid_wraparound directory adds the regression tests but the .sql and
> .out files are missing in the patch. Perhaps the patch wants to add
> only tap tests as Makefile doesn't define REGRESS?
>
> Even after fixing this issue, CI tests (Cirrus CI) are not happy and
> report failures due to a disk full. The size of xid_wraparound test
> directory is 105MB out of 262MB:
>
> % du -sh testrun
> 262M testrun
> % du -sh testrun/xid_wraparound/
> 105M testrun/xid_wraparound/
> % du -sh testrun/xid_wraparound/*
> 460K testrun/xid_wraparound/001_emergency_vacuum
> 93M testrun/xid_wraparound/002_limits
> 12M testrun/xid_wraparound/003_wraparounds
> % ls -lh testrun/xid_wraparound/002_limits/log*
> total 93M
> -rw-------. 1 masahiko masahiko 93M Mar 7 17:34 002_limits_wraparound.log
> -rw-rw-r--. 1 masahiko masahiko 20K Mar 7 17:34 regress_log_002_limits
>
> The biggest file is the server logs since an autovacuum worker writes
> autovacuum logs for every table for every second (autovacuum_naptime
> is 1s). Maybe we can set log_autovacuum_min_duration reloption for the
> test tables instead of globally enabling it

I think it could be acceptable since 002 and 003 tests are executed
only when required. And 001 test seems to be able to pass on cfbot but
it takes more than 30 sec. In the attached patch, I made these tests
optional and these are enabled if envar ENABLE_XID_WRAPAROUND_TESTS is
defined (supporting only autoconf).

>
> The 001 test uses the 2PC transaction that holds locks on tables but
> since we can consume xids while the server running, we don't need
> that. Instead I think we can keep a transaction open in the background
> like 002 test does.

Updated in the new patch. Also, I added a check if the failsafe mode
is triggered.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment Content-Type Size
v2-0001-Add-tests-for-XID-wraparound.patch application/octet-stream 22.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-03-14 06:50:14 Re: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher
Previous Message shiy.fnst@fujitsu.com 2023-03-14 06:01:11 RE: [PATCH] Use indexes on the subscriber when REPLICA IDENTITY is full on the publisher