Re: logical decoding and replication of sequences

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-03-08 22:44:40
Message-ID: b775bee1-b6c8-098e-9c78-5a5f7ec9abdd@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/7/22 22:11, Tomas Vondra wrote:
>
>
> On 3/7/22 17:39, Tomas Vondra wrote:
>>
>>
>> On 3/1/22 12:53, Amit Kapila wrote:
>>> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>
>>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>>
>>>>> On 2/10/22 19:17, Tomas Vondra wrote:
>>>>>> I've polished & pushed the first part adding sequence decoding
>>>>>> infrastructure etc. Attached are the two remaining parts.
>>>>>>
>>>>>> I plan to wait a day or two and then push the test_decoding part. The
>>>>>> last part (for built-in replication) will need more work and maybe
>>>>>> rethinking the grammar etc.
>>>>>>
>>>>>
>>>>> I've pushed the second part, adding sequences to test_decoding.
>>>>>
>>>>
>>>> The test_decoding is failing randomly in the last few days. I am not
>>>> completely sure but they might be related to this work. The two of
>>>> these appears to be due to the same reason:
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>>>>
>>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>>>> "reorderbuffer.c", Line: 1173, PID: 35013)
>>>> 0 postgres 0x00593de0 ExceptionalCondition + 160\\0
>>>>
>>>
>>> While reviewing the code for this, I noticed that in
>>> sequence_decode(), we don't call ReorderBufferProcessXid to register
>>> the first known lsn in WAL for the current xid. The similar functions
>>> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
>>> even if they decide not to queue or send the change. Is there a reason
>>> for not doing the same here? However, I am not able to deduce any
>>> scenario where lack of this will lead to such an Assertion failure.
>>> Any thoughts?
>>>
>>
>> Thanks, that seems like an omission. Will fix.
>>
>
> I've pushed this simple fix. Not sure it'll fix the assert failures on
> skink/locust, though. Given the lack of information it'll be difficult
> to verify. So let's wait a bit.
>

I've done about 5000 runs of 'make check' in test_decoding, on two rpi
machines (one armv7, one aarch64). Not a single assert failure :-(

How come skink/locust hit that in just a couple runs?

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-03-08 22:55:04 Re: Naming of the different stats systems / "stats collector"
Previous Message Nathan Bossart 2022-03-08 22:12:53 Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file