Re: logical decoding and replication of sequences

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-03-08 18:29:06
Message-ID: 5709e125-a50c-e386-3dce-1d662b588ac7@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 3/7/22 22:25, Tomas Vondra wrote:
>
>
> On 3/7/22 17:53, Tomas Vondra wrote:
>> On 2/28/22 12:46, Amit Kapila wrote:
>>> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
>>> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>>>
>>>> On 2/10/22 19:17, Tomas Vondra wrote:
>>>>> I've polished & pushed the first part adding sequence decoding
>>>>> infrastructure etc. Attached are the two remaining parts.
>>>>>
>>>>> I plan to wait a day or two and then push the test_decoding part. The
>>>>> last part (for built-in replication) will need more work and maybe
>>>>> rethinking the grammar etc.
>>>>>
>>>>
>>>> I've pushed the second part, adding sequences to test_decoding.
>>>>
>>>
>>> The test_decoding is failing randomly in the last few days. I am not
>>> completely sure but they might be related to this work. The two of
>>> these appears to be due to the same reason:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>>>
>>> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
>>> "reorderbuffer.c", Line: 1173, PID: 35013)
>>> 0 postgres 0x00593de0 ExceptionalCondition + 160\\0
>>>
>>
>> This might be related to the issue reported by Amit, i.e. that
>> sequence_decode does not call ReorderBufferProcessXid(). If this keeps
>> failing, we'll have to add some extra debug info (logging LSN etc.), at
>> least temporarily. It'd be valuable to inspect the WAL too.
>>
>>> Another:
>>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2022-02-16%2006%3A21%3A48
>>>
>>> --- /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
>>> 2022-02-14 20:19:14.000000000 +0000
>>> +++ /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
>>> 2022-02-16 07:42:18.000000000 +0000
>>> @@ -126,6 +126,7 @@
>>> table public.replication_example: INSERT: id[integer]:4
>>> somedata[integer]:3 text[character varying]:null
>>> testcolumn1[integer]:null
>>> table public.replication_example: INSERT: id[integer]:5
>>> somedata[integer]:4 text[character varying]:null
>>> testcolumn1[integer]:2 testcolumn2[integer]:1
>>> COMMIT
>>> + sequence public.replication_example_id_seq: transactional:0
>>> last_value: 38 log_cnt: 0 is_called:1
>>> BEGIN
>>> table public.replication_example: INSERT: id[integer]:6
>>> somedata[integer]:5 text[character varying]:null
>>> testcolumn1[integer]:3 testcolumn2[integer]:null
>>> COMMIT
>>> @@ -133,7 +134,7 @@
>>> table public.replication_example: INSERT: id[integer]:7
>>> somedata[integer]:6 text[character varying]:null
>>> testcolumn1[integer]:4 testcolumn2[integer]:null
>>> table public.replication_example: INSERT: id[integer]:8
>>> somedata[integer]:7 text[character varying]:null
>>> testcolumn1[integer]:5 testcolumn2[integer]:null
>>> testcolumn3[integer]:1
>>> COMMIT
>>> - (15 rows)
>>> + (16 rows)
>>>
>>
>> Interesting. I can think of one reason that might cause this - we log
>> the first sequence increment after a checkpoint. So if a checkpoint
>> happens in an unfortunate place, there'll be an extra WAL record. On
>> slow / busy machines that's quite possible, I guess.
>>
>
> I've tweaked the checkpoint_interval to make checkpoints more aggressive
> (set it to 1s), and it seems my hunch was correct - it produces failures
> exactly like this one. The best fix probably is to just disable decoding
> of sequences in those tests that are not aimed at testing sequence decoding.
>

I've pushed a fix for this, adding "include-sequences=0" to a couple
test_decoding tests, which were failing with concurrent checkpoints.

Unfortunately, I realized we have a similar issue in the "sequences"
tests too :-( Imagine you do a series of sequence increments, e.g.

SELECT nextval('s') FROM generate_sequences(1,100);

If there's a concurrent checkpoint, this may add an extra WAL record,
affecting the decoded output (and also the data stored in the sequence
relation itself). Not sure what to do about this ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Gunnar "Nick" Bluth 2022-03-08 18:32:03 Re: [PATCH] pg_stat_toast v8
Previous Message Andres Freund 2022-03-08 18:18:37 cpluspluscheck complains about use of register