Re: logical decoding and replication of sequences

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-03-07 16:53:27
Message-ID: 5bd18c76-c38b-86e1-78d3-73c44ac59a7f@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/28/22 12:46, Amit Kapila wrote:
> On Sat, Feb 12, 2022 at 6:04 AM Tomas Vondra
> <tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>>
>> On 2/10/22 19:17, Tomas Vondra wrote:
>>> I've polished & pushed the first part adding sequence decoding
>>> infrastructure etc. Attached are the two remaining parts.
>>>
>>> I plan to wait a day or two and then push the test_decoding part. The
>>> last part (for built-in replication) will need more work and maybe
>>> rethinking the grammar etc.
>>>
>>
>> I've pushed the second part, adding sequences to test_decoding.
>>
>
> The test_decoding is failing randomly in the last few days. I am not
> completely sure but they might be related to this work. The two of
> these appears to be due to the same reason:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2022-02-25%2018%3A50%3A09
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=locust&dt=2022-02-17%2015%3A17%3A07
>
> TRAP: FailedAssertion("prev_first_lsn < cur_txn->first_lsn", File:
> "reorderbuffer.c", Line: 1173, PID: 35013)
> 0 postgres 0x00593de0 ExceptionalCondition + 160\\0
>

This might be related to the issue reported by Amit, i.e. that
sequence_decode does not call ReorderBufferProcessXid(). If this keeps
failing, we'll have to add some extra debug info (logging LSN etc.), at
least temporarily. It'd be valuable to inspect the WAL too.

> Another:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=mandrill&dt=2022-02-16%2006%3A21%3A48
>
> --- /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/expected/rewrite.out
> 2022-02-14 20:19:14.000000000 +0000
> +++ /home/nm/farm/xlc32/HEAD/pgsql.build/contrib/test_decoding/results/rewrite.out
> 2022-02-16 07:42:18.000000000 +0000
> @@ -126,6 +126,7 @@
> table public.replication_example: INSERT: id[integer]:4
> somedata[integer]:3 text[character varying]:null
> testcolumn1[integer]:null
> table public.replication_example: INSERT: id[integer]:5
> somedata[integer]:4 text[character varying]:null
> testcolumn1[integer]:2 testcolumn2[integer]:1
> COMMIT
> + sequence public.replication_example_id_seq: transactional:0
> last_value: 38 log_cnt: 0 is_called:1
> BEGIN
> table public.replication_example: INSERT: id[integer]:6
> somedata[integer]:5 text[character varying]:null
> testcolumn1[integer]:3 testcolumn2[integer]:null
> COMMIT
> @@ -133,7 +134,7 @@
> table public.replication_example: INSERT: id[integer]:7
> somedata[integer]:6 text[character varying]:null
> testcolumn1[integer]:4 testcolumn2[integer]:null
> table public.replication_example: INSERT: id[integer]:8
> somedata[integer]:7 text[character varying]:null
> testcolumn1[integer]:5 testcolumn2[integer]:null
> testcolumn3[integer]:1
> COMMIT
> - (15 rows)
> + (16 rows)
>

Interesting. I can think of one reason that might cause this - we log
the first sequence increment after a checkpoint. So if a checkpoint
happens in an unfortunate place, there'll be an extra WAL record. On
slow / busy machines that's quite possible, I guess.

I wonder if these two issues might be related ...

regards

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2022-03-07 16:58:22 Re: role self-revocation
Previous Message Tomas Vondra 2022-03-07 16:39:16 Re: logical decoding and replication of sequences