Re: Testing autovacuum wraparound (including failsafe)

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Testing autovacuum wraparound (including failsafe)
Date: 2021-04-23 23:12:33
Message-ID: CAH2-Wz=f2dfc-0BcM5X5ttXTeRNPg+JtxEMCuSZR82a7XHfRkg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Apr 23, 2021 at 1:43 PM Andres Freund <andres(at)anarazel(dot)de> wrote:
> I started to write a test for $Subject, which I think we sorely need.

+1

> Currently my approach is to:
> - start a cluster, create a few tables with test data
> - acquire SHARE UPDATE EXCLUSIVE in a prepared transaction, to prevent
> autovacuum from doing anything
> - cause dead tuples to exist
> - restart
> - run pg_resetwal -x 2000027648
> - do things like acquiring pins on pages that block vacuum from progressing
> - commit prepared transaction
> - wait for template0, template1 datfrozenxid to increase
> - wait for relfrozenxid for most relations in postgres to increase
> - release buffer pin
> - wait for postgres datfrozenxid to increase

Just having a standard-ish way to do stress testing like this would
add something.

> 2) FAILSAFE_MIN_PAGES is 4GB - which seems to make it infeasible to test the
> failsafe mode, we can't really create 4GB relations on the BF. While
> writing the tests I've lowered this to 4MB...

The only reason that I chose 4GB for FAILSAFE_MIN_PAGES is because the
related VACUUM_FSM_EVERY_PAGES constant was 8GB -- the latter limits
how often we'll consider the failsafe in the single-pass/no-indexes
case.

I see no reason why it cannot be changed now. VACUUM_FSM_EVERY_PAGES
also frustrates FSM testing in the single-pass case in about the same
way, so maybe that should be considered as well? Note that the FSM
handling for the single pass case is actually a bit different to the
two pass/has-indexes case, since the single pass case calls
lazy_vacuum_heap_page() directly in its first and only pass over the
heap (that's the whole point of having it of course).

> 3) pg_resetwal -x requires to carefully choose an xid: It needs to be the
> first xid on a clog page. It's not hard to determine which xids are but it
> depends on BLCKSZ and a few constants in clog.c. I've for now hardcoded a
> value appropriate for 8KB, but ...

Ugh.

> For 2), I don't really have a better idea than making that configurable
> somehow?

That could make sense as a developer/testing option, I suppose. I just
doubt that it makes sense as anything else.

> 2021-04-23 13:32:30.899 PDT [2027738] LOG: automatic aggressive vacuum to prevent wraparound of table "postgres.public.small_trunc": index scans: 1
> pages: 400 removed, 28 remain, 0 skipped due to pins, 0 skipped frozen
> tuples: 14000 removed, 1000 remain, 0 are dead but not yet removable, oldest xmin: 2000027651
> buffer usage: 735 hits, 1262 misses, 874 dirtied
> index scan needed: 401 pages from table (1432.14% of total) had 14000 dead item identifiers removed
> index "small_trunc_pkey": pages: 43 in total, 37 newly deleted, 37 currently deleted, 0 reusable
> avg read rate: 559.048 MB/s, avg write rate: 387.170 MB/s
> system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
> WAL usage: 1809 records, 474 full page images, 3977538 bytes
>
> '1432.14% of total' - looks like removed pages need to be added before the
> percentage calculation?

Clearly this needs to account for removed heap pages in order to
consistently express the percentage of pages with LP_DEAD items in
terms of a percentage of the original table size. I can fix this
shortly.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2021-04-23 23:26:17 Re: Testing autovacuum wraparound (including failsafe)
Previous Message Justin Pryzby 2021-04-23 23:08:12 Re: Testing autovacuum wraparound (including failsafe)