Re: Testing autovacuum wraparound (including failsafe)

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Ian Lawrence Barwick <barwick(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Anastasia Lubennikova <lubennikovaav(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>
Subject: Re: Testing autovacuum wraparound (including failsafe)
Date: 2023-03-08 04:52:31
Message-ID: CAD21AoDouzPRqhZJWpoXje=MPFUx8kM3SGa2MkrRtsjzJSrSpQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Mar 3, 2023 at 8:34 PM Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>
> On 16/11/2022 06:38, Ian Lawrence Barwick wrote:
> > Thanks for the patch. While reviewing the patch backlog, we have determined that
> > the latest version of this patch was submitted before meson support was
> > implemented, so it should have a "meson.build" file added for consideration for
> > inclusion in PostgreSQL 16.
>
> I wanted to do some XID wraparound testing again, to test the 64-bit
> SLRUs patches [1], and revived this.

Thank you for reviving this thread!

>
> I took a different approach to consuming the XIDs. Instead of setting
> nextXID directly, bypassing GetNewTransactionId(), this patch introduces
> a helper function to call GetNewTransactionId() repeatedly. But because
> that's slow, it does include a shortcut to skip over "uninteresting"
> XIDs. Whenever nextXid is close to an SLRU page boundary or XID
> wraparound, it calls GetNewTransactionId(), and otherwise it bumps up
> nextXid close to the next "interesting" value. That's still a lot slower
> than just setting nextXid, but exercises the code more realistically.
>
> I've written some variant of this helper function many times over the
> years, for ad hoc testing. I'd love to have it permanently in the git tree.

These functions seem to be better than mine.

> In addition to Masahiko's test for emergency vacuum, this includes two
> other tests. 002_limits.pl tests the "warn limit" and "stop limit" in
> GetNewTransactionId(), and 003_wraparound.pl burns through 10 billion
> transactions in total, exercising XID wraparound in general.
> Unfortunately these tests are pretty slow; the tests run for about 4
> minutes on my laptop in total, and use about 20 GB of disk space. So
> perhaps these need to be put in a special test suite that's not run as
> part of "check-world". Or perhaps leave out the 003_wraparounds.pl test,
> that's the slowest of the tests. But I'd love to have these in the git
> tree in some form.

cbfot reports some failures. The main reason seems that meson.build in
xid_wraparound directory adds the regression tests but the .sql and
.out files are missing in the patch. Perhaps the patch wants to add
only tap tests as Makefile doesn't define REGRESS?

Even after fixing this issue, CI tests (Cirrus CI) are not happy and
report failures due to a disk full. The size of xid_wraparound test
directory is 105MB out of 262MB:

% du -sh testrun
262M testrun
% du -sh testrun/xid_wraparound/
105M testrun/xid_wraparound/
% du -sh testrun/xid_wraparound/*
460K testrun/xid_wraparound/001_emergency_vacuum
93M testrun/xid_wraparound/002_limits
12M testrun/xid_wraparound/003_wraparounds
% ls -lh testrun/xid_wraparound/002_limits/log*
total 93M
-rw-------. 1 masahiko masahiko 93M Mar 7 17:34 002_limits_wraparound.log
-rw-rw-r--. 1 masahiko masahiko 20K Mar 7 17:34 regress_log_002_limits

The biggest file is the server logs since an autovacuum worker writes
autovacuum logs for every table for every second (autovacuum_naptime
is 1s). Maybe we can set log_autovacuum_min_duration reloption for the
test tables instead of globally enabling it

The 001 test uses the 2PC transaction that holds locks on tables but
since we can consume xids while the server running, we don't need
that. Instead I think we can keep a transaction open in the background
like 002 test does.

I'll try these ideas.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2023-03-08 04:56:58 Re: optimize several list functions with SIMD intrinsics
Previous Message Hayato Kuroda (Fujitsu) 2023-03-08 04:40:20 RE: [Proposal] Add foreign-server health checks infrastructure