From: | Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: logical decoding and replication of sequences |
Date: | 2022-03-08 22:44:40 |
Message-ID: | b775bee1-b6c8-098e-9c78-5a5f7ec9abdd@enterprisedb.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 3/7/22 22:11, Tomas Vondra wrote:
>
>
> On 3/7/22 17:39, Tomas Vondra wrote:
>>
>>
>> On 3/1/22 12:53, Amit Kapila wrote:
>>> On Mon, Feb 28, 2022 at 5:16 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>>>
>>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>>
>>>>> On 2/10/22 19:17, Tomas Vondra wrote:
>>>>>> I've polished & pushed the first part adding sequence decoding
>>>>>> infrastructure etc. Attached are the two remaining parts.
>>>>>>
>>>>>> I plan to wait a day or two and then push the test_decoding part. The
>>>>>> last part (for built-in replication) will need more work and maybe
>>>>>> rethinking the grammar etc.
>>>>>>
>>>>>
>>>>> I've pushed the second part, adding sequences to test_decoding.
>>>>>
>>>>
>>>> The test_decoding is failing randomly in the last few days. I am not
>>>> completely sure but they might be related to this work. The two of
>>>> these appears to be due to the same reason:
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
>>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>>>>
>>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>>>> "reorderbuffer.c", Line: 1173, PID: 35013)
>>>> 0 postgres 0x00593de0 ExceptionalCondition + 160\\0
>>>>
>>>
>>> While reviewing the code for this, I noticed that in
>>> sequence_decode(), we don't call ReorderBufferProcessXid to register
>>> the first known lsn in WAL for the current xid. The similar functions
>>> logicalmsg_decode() or heap_decode() do call ReorderBufferProcessXid
>>> even if they decide not to queue or send the change. Is there a reason
>>> for not doing the same here? However, I am not able to deduce any
>>> scenario where lack of this will lead to such an Assertion failure.
>>> Any thoughts?
>>>
>>
>> Thanks, that seems like an omission. Will fix.
>>
>
> I've pushed this simple fix. Not sure it'll fix the assert failures on
> skink/locust, though. Given the lack of information it'll be difficult
> to verify. So let's wait a bit.
>
I've done about 5000 runs of 'make check' in test_decoding, on two rpi
machines (one armv7, one aarch64). Not a single assert failure :-(
How come skink/locust hit that in just a couple runs?
regards
--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | David G. Johnston | 2022-03-08 22:55:04 | Re: Naming of the different stats systems / "stats collector" |
Previous Message | Nathan Bossart | 2022-03-08 22:12:53 | Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file |