Re: Fixing WAL instability in various TAP tests

From: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michael Paquier <michael(at)paquier(dot)xyz>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Noah Misch <noah(at)leadboat(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fixing WAL instability in various TAP tests
Date: 2021-09-28 18:43:30
Message-ID: 2B700487-C43E-41AF-A8B3-444A12DFEC30@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Sep 28, 2021, at 11:07 AM, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> Looking closer at the TAP test, it's not ORDERing the result set from the SELECTs on either node, but it is comparing the sets for stringwise equality, which is certainly order dependent.

Taking the output from the buildfarm page, parsing out the first test's results and comparing got vs. expected for this test:

is($primary_result, $standby_result, "$test_name: query result matches");

the primary result had all the same rows as the standby, along with additional rows. Comparing the results, they match other than rows missing from the standby that are present on the primary. That seems consistent with the view that the query on the standby is running before all the data has replicated across.

However, the missing rows all have column i either 0 or 3, though the test round-robins i=0..9 as it performs the inserts. I would expect the wal for the inserts to not cluster around any particular value of i. The DELETE and VACUUM commands do operate on a single value of i, so that would make sense if the data failed to be deleted on the standby after successfully being deleted on the primary, but then I'd expect the standby to have more rows, not fewer.

Perhaps having the bloom index messed up answers that, though. I think it should be easy enough to get the path to the heap main table fork and the bloom main index fork for both the primary and standby and do a filesystem comparison as part of the wal test. That would tell us if they differ, and also if the differences are limited to just one or the other.

I'll go write that up....


Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2021-09-28 19:00:13 Re: Fixing WAL instability in various TAP tests
Previous Message Tom Lane 2021-09-28 18:20:18 Re: Fixing WAL instability in various TAP tests