Re: Something is wrong with wal_compression

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Something is wrong with wal_compression
Date: 2023-01-30 05:57:13
Message-ID: Y9dcOTrHIQC2BgXC@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 28, 2023 at 12:02:23AM -0500, Tom Lane wrote:
> My thoughts were trending in that direction too. It's starting
> to sound like we aren't going to be able to make a fix that
> we'd be willing to risk back-patching, even if it were completely
> compatible at the user level.
>
> Still, the idea that txid_status() isn't trustworthy is rather
> scary. I wonder whether there is a failure mode here that's
> exhibitable without using that.

Okay, as far as I can see, the consensus would be to not do anything
about the performance impact of these functions:
20210305(dot)115011(dot)558061052471425531(dot)horikyota(dot)ntt(at)gmail(dot)com

Three of my buildfarm machines are unstable because of that, they need
something for stable branches as well, and I'd like them to stress
their options.

Based on what's been mentioned, we can:
1) tweak the test with an extra checkpoint to make sure that the XIDs
are flushed, like in the patch posted on [1].
2) tweak the test to rely on a state of the table, as
mentioned by Andrey.
3) remove entirely the test, because as introduced it does not
actually test what it should.

2) is not really interesting, IMO, because the test checks for two
things:
- an in-progress XID, which we already do in the main regression test
suite.
- a post-crash state, and switching to an approach where some data is
for example scanned is no different than a lot of the other recovery
tests.

1) means more test cycles, and perhaps we could enforce compression of
WAL while on it? At the end, my vote would just go for 3) and drop
the whole scenario, though there may be an argument in 1).

[1]: https://www.postgresql.org/message-id/20210305.115011.558061052471425531.horikyota.ntt@gmail.com
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2023-01-30 06:04:11 Re: Assertion failure in SnapBuildInitialSnapshot()
Previous Message Bharath Rupireddy 2023-01-30 05:55:23 Re: An attempt to avoid locally-committed-but-not-replicated-to-standby-transactions in synchronous replication