Quick Links

Re: Adding CI to our tree

From:	Andres Freund <andres(at)anarazel(dot)de>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	Justin Pryzby <pryzby(at)telsasoft(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Melanie Plageman <melanieplageman(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Daniel Gustafsson <daniel(at)yesql(dot)se>
Subject:	Re: Adding CI to our tree
Date:	2022-01-19 04:16:46
Message-ID:	20220119041646.rhuo3youiqxqjmo2@alap3.anarazel.de
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hi,

On 2022-01-18 21:50:07 -0500, Tom Lane wrote:
> I just found one thing making check-world slower than it ought to be:
> src/test/recovery/t/008_fsm_truncation.pl does
>
> $node_primary->append_conf(
> 'postgresql.conf', qq{
> fsync = on
> wal_log_hints = on
> max_prepared_transactions = 5
> autovacuum = off
> });
>
> There is no reason for this script to be overriding Cluster.pm's
> fsync = off setting.
>
> This appears to go back to 917dc7d23 of 2016, so I think it just
> predates our recognition that we should disable fsync in routine
> tests.

Yea, I noticed this too. I was wondering if there's a conceivable reason to
actually want fsyncs, but I couldn't come up with one.

On systems where IO isn't overloaded, the main problem with this test are
elsewhere: It multiple times waits for VACUUMs that are blocked truncating the
table. Which these days takes 5 seconds. Thus the test takes quite a while.

To me VACUUM_TRUNCATE_LOCK_TIMEOUT = 5s seems awfully long. On a system with a
lot of tables that's much more than vacuum will take. So this can easily lead
to using up all autovacuum workers...

> This actually causes parallel check-world to fail altogether on florican's
> host, because the initial fsync of the recovered primary takes more than 3
> minutes when there's conflicting I/O traffic, causing pg_ctl to time out.

Ugh.

I noticed a few other sources of "unnecessary" fsyncs. The most frequent
being the durable_rename() of backup_manifest in pg_basebackup.c. Manifests are
surprisingly large, 135k for a freshly initdb'd cluster.

There's an fsync in walmethods.c:tar_close() that sounds intentional, but I
don't really understand what the comment:

/* Always fsync on close, so the padding gets fsynced */
if (tar_sync(f) < 0)

Greetings,

Andres Freund

In response to

Re: Adding CI to our tree at 2022-01-19 02:50:07 from Tom Lane

Responses

Re: Adding CI to our tree at 2022-01-19 04:39:20 from Tom Lane
Re: Adding CI to our tree at 2022-01-19 04:54:12 from Tom Lane
pg_basebackup fsyncs some files despite --no-sync (was: Adding CI to our tree) at 2022-01-21 20:00:57 from Andres Freund

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Michael Paquier	2022-01-19 04:27:28	Re: Refactoring of compression options in pg_basebackup
Previous Message	Ian Lawrence Barwick	2022-01-19 03:25:45	Re: docs: pg_replication_origin_oid() description does not match behaviour