Re: Column Filtering in Logical Replication

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Justin Pryzby <pryzby(at)telsasoft(dot)com>, Rahila Syed <rahilasyed90(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, "shiy(dot)fnst(at)fujitsu(dot)com" <shiy(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Column Filtering in Logical Replication
Date: 2022-03-26 21:52:53
Message-ID: 7dbaf4d7-713f-26e8-13c7-b79ed2c0cb16@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/26/22 22:37, Tom Lane wrote:
> Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> writes:
>> I went over the patch again, polished the commit message a bit, and
>> pushed. May the buildfarm be merciful!
>
> Initial results aren't that great. komodoensis[1], petalura[2],
> and snapper[3] have all shown variants of
>
> # Failed test 'partitions with different replica identities not replicated correctly'
> # at t/031_column_list.pl line 734.
> # got: '2|4|
> # 4|9|'
> # expected: '1||5
> # 2|4|
> # 3||8
> # 4|9|'
> # Looks like you failed 1 test of 34.
> [18:19:36] t/031_column_list.pl ...............
> Dubious, test returned 1 (wstat 256, 0x100)
> Failed 1/34 subtests
>
> snapper reported different actual output than the other two:
> # got: '1||5
> # 3||8'
>
> The failure seems intermittent, as both komodoensis and petalura
> have also passed cleanly since the commit (snapper's only run once).
>
> This smells like an uninitialized-variable problem, but I've had
> no luck finding any problem under valgrind. Not sure how to progress
> from here.
>

I think I see the problem - there's a CREATE SUBSCRIPTION but the test
is not waiting for the tablesync to complete, so sometimes it finishes
in time and sometimes not. That'd explain the flaky behavior, and it's
just this one test that misses the sync AFAICS.

FWIW I did run this under valgrind a number of times, and also on
various ARM machines that tend to trip over memory issues.

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2022-03-26 21:55:55 Re: Column Filtering in Logical Replication
Previous Message Andres Freund 2022-03-26 21:52:23 Re: pg_stat_get_replication_slot() marked not strict, crashes