Re: logical decoding and replication of sequences

From: Zheng Li <zhengli10(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Petr Jelinek <petr(dot)jelinek(at)enterprisedb(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: logical decoding and replication of sequences
Date: 2022-05-25 20:42:29
Message-ID: CAAD30U+FD+QprMqY8wPo2n+3eS7B8=f2XYNjrUC9Sq-OFp6zDA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> But of course, if we expect/require to have a perfect snapshot for that
> exact position in the transaction, this won't work. IMO the whole idea
> that we can have non-transactional bits in naturally transactional
> decoding seems a bit suspicious (at least in hindsight).
>
> No matter what we do for sequences, though, this still affects logical
> messages too. Not sure what to do there :-(

Hi, I spent some time trying to understand this problem while I was
evaluating its impact on the DDL replication in [1]. I think for DDL
we could always remove the
non-transactional bits since DDL will probably always be processed
transactionally.

I attempted to solve the problem for messages. Here is a potential
solution by keeping track of
the last decoded/acked non-transactional message/operation lsn and use
it to check if a non-transactional message record should be skipped
during decoding,
to do that I added new fields
ReplicationSlotPersistentData.non_xact_op_at,
XLogReaderState.NonXactOpRecPtr and
SnapBuild.start_decoding_nonxactop_at.
This is the end LSN of the last non-transactional message/operation
decoded/acked. I verified this approach solves the issue of
missing decoding of non-transactional messages under
concurrency/before the builder state reaches SNAPBUILD_CONSISTENT.
Once
the builder state reach SNAPBUILD_CONSISTENT, the new field
ReplicationSlotPersistentData.non_xact_op_at can be set
to ReplicationSlotPersistentData.confirmed_flush.

Similar to the sequence issue, here is the test case for logical messages:

Test concurrent execution in 3 sessions that allows pg_logical_emit_message in
session-2 to happen before we reach a consistent point and commit
happens after a consistent point:

Session-2:

Begin;
SELECT pg_current_xact_id();

Session-1:
SELECT 'init' FROM pg_create_logical_replication_slot('test_slot',
'test_decoding', false, true);

Session-3:

Begin;
SELECT pg_current_xact_id();

Session-2:

Commit;
Begin;
SELECT pg_logical_emit_message(true, 'test_decoding', 'msg1');
SELECT pg_logical_emit_message(false, 'test_decoding', 'msg2');

Session-3:

Commit;

Session-1: (at this point, the session will crash without the fix)

SELECT data FROM pg_logical_slot_get_changes('test_slot', NULL, NULL,
'force-binary', '0', 'skip-empty-xacts', '1');
data
---------------------------------------------------------------------
message: transactional: 0 prefix: test_decoding, sz: 4 content:msg1

Session-2:

Commit;

Session-1:

SELECT data FROM pg_logical_slot_get_changes('regression_slot', NULL,
NULL, 'force-binary', '0', 'skip-empty-xacts', '1');
data
---------------------------------------------------------------------
message: transactional: 1 prefix: test_decoding, sz: 4 content:msg2

I also tried the same approach on sequences (on a commit before the
revert of sequence replication) and it seems to be working but
I think it needs further testing.

Patch 0001-Intorduce-new-field-ReplicationSlotPersistentData.no.patch
applies on master which contains the fix for logical messages.

[1] https://www.postgresql.org/message-id/flat/CAAD30U+pVmfKwUKy8cbZOnUXyguJ-uBNejwD75Kyo=OjdQGJ9g(at)mail(dot)gmail(dot)com

Thoughts?

With Regards,
Zheng

Attachment Content-Type Size
0001-Intorduce-new-field-ReplicationSlotPersistentData.no.patch application/octet-stream 12.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2022-05-25 23:53:55 Re: fix stats_fetch_consistency value in postgresql.conf.sample
Previous Message Tom Lane 2022-05-25 20:34:54 Re: allow building trusted languages without the untrusted versions