Re: logical decoding and replication of sequences, take 2

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Dilip Kumar <dilipbalaut(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-12-14 07:01:03
Message-ID: CAExHW5szMs=Wzy1Ek00rkByQbTk57nK_UvMp+VYwT9Np6LnO5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 14, 2023 at 10:53 AM Dilip Kumar <dilipbalaut(at)gmail(dot)com> wrote:

> > >
> >
> > It is correct that we can make a wrong decision about whether a change
> > is transactional or non-transactional when sequence DDL happens before
> > the SNAPBUILD_FULL_SNAPSHOT state and the sequence operation happens
> > after that state. However, one thing to note here is that we won't try
> > to stream such a change because for non-transactional cases we don't
> > proceed unless the snapshot is in a consistent state. Now, if the
> > decision had been correct then we would probably have queued the
> > sequence change and discarded at commit.
> >
> > One thing that we deviate here is that for non-sequence transactional
> > cases (including logical messages), we immediately start queuing the
> > changes as soon as we reach SNAPBUILD_FULL_SNAPSHOT state (provided
> > SnapBuildProcessChange() returns true which is quite possible) and
> > take final decision at commit/prepare/abort time. However, that won't
> > be the case for sequences because of the dependency of determining
> > transactional cases on one of the prior records. Now, I am not
> > completely sure at this stage if such a deviation can cause any
> > problem and or whether we are okay to have such a deviation for
> > sequences.
>
> Okay, so this particular scenario that I raised is somehow saved, I
> mean although we are considering transactional sequence operation as
> non-transactional we also know that if some of the changes for a
> transaction are skipped because the snapshot was not FULL that means
> that transaction can not be streamed because that transaction has to
> be committed before snapshot become CONSISTENT (based on the snapshot
> state change machinery). Ideally based on the same logic that the
> snapshot is not consistent the non-transactional sequence changes are
> also skipped. But the only thing that makes me a bit uncomfortable is
> that even though the result is not wrong we have made some wrong
> intermediate decisions i.e. considered transactional change as
> non-transactions.
>
> One solution to this issue is that, even if the snapshot state does
> not reach FULL just add the sequence relids to the hash, I mean that
> hash is only maintained for deciding whether the sequence is changed
> in that transaction or not. So no adding such relids to hash seems
> like a root cause of the issue. Honestly, I haven't analyzed this
> idea in detail about how easy it would be to add only these changes to
> the hash and what are the other dependencies, but this seems like a
> worthwhile direction IMHO.

I also thought about the same solution. I tried this solution as the
attached patch on top of Hayato's diagnostic changes. Following log
messages are seen in server error log. Those indicate that the
sequence change was correctly deemed as a transactional change (line
2023-12-14 12:14:55.591 IST [321229] LOG: XXX: the sequence is
transactional).
2023-12-14 12:12:50.550 IST [321229] ERROR: relation
"pg_replication_slot" does not exist at character 15
2023-12-14 12:12:50.550 IST [321229] STATEMENT: select * from
pg_replication_slot;
2023-12-14 12:12:57.289 IST [321229] LOG: logical decoding found
initial starting point at 0/1598D50
2023-12-14 12:12:57.289 IST [321229] DETAIL: Waiting for transactions
(approximately 1) older than 759 to end.
2023-12-14 12:12:57.289 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:13:49.551 IST [321229] LOG: XXX: smgr_decode. snapshot
is SNAPBUILD_BUILDING_SNAPSHOT
2023-12-14 12:13:49.551 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:13:49.551 IST [321229] LOG: XXX: seq_decode. snapshot is
SNAPBUILD_BUILDING_SNAPSHOT
2023-12-14 12:13:49.551 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:13:49.551 IST [321229] LOG: XXX: skipped
2023-12-14 12:13:49.551 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:13:49.552 IST [321229] LOG: logical decoding found
initial consistent point at 0/1599170
2023-12-14 12:13:49.552 IST [321229] DETAIL: Waiting for transactions
(approximately 1) older than 760 to end.
2023-12-14 12:13:49.552 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:14:55.591 IST [321229] LOG: XXX: seq_decode. snapshot is
SNAPBUILD_FULL_SNAPSHOT
2023-12-14 12:14:55.591 IST [321230] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:14:55.591 IST [321229] LOG: XXX: the sequence is transactional
2023-12-14 12:14:55.591 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');
2023-12-14 12:14:55.813 IST [321229] LOG: logical decoding found
consistent point at 0/15992E8
2023-12-14 12:14:55.813 IST [321229] DETAIL: There are no running transactions.
2023-12-14 12:14:55.813 IST [321229] STATEMENT: SELECT
pg_create_logical_replication_slot('slot', 'test_decoding');

It looks like the solution works. But this is the only place where we
process a change before SNAPSHOT reaches FULL. But this is also the
only record which affects a decision to queue/not a following change.
So it should be ok. The sequence_hash'es as separate for each
transaction and they are cleaned when processing COMMIT record. So I
think we don't have any side effects of adding relfilenode to sequence
hash even though snapshot is not FULL.

As a side note
1. the prologue of ReorderBufferSequenceCleanup() mentions only abort,
but this function will be called for COMMIT as well. Prologue needs to
be fixed.
2. Now that sequence hashes are per transaction, do we need
ReoderBufferTXN in ReorderBufferSequenceEnt?

--
Best Wishes,
Ashutosh Bapat

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dilip Kumar 2023-12-14 07:07:07 Re: logical decoding and replication of sequences, take 2
Previous Message Matthias van de Meent 2023-12-14 06:21:30 Re: Reducing output size of nodeToString