Re: logical decoding and replication of sequences, take 2

From: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2024-01-11 16:26:56
Message-ID: 09613730-5ee9-4cc3-82d8-f089be90aa64@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Here's new version of this patch series. It rebases the 2023/12/03
version, and there's a couple improvements to address the performance
and correctness questions.

Since the 2023/12/03 version was posted, there were a couple off-list
discussions with several people - with Amit, as mentioned in [1], and
then also internally and at pgconf.eu.

My personal (very brief) takeaway from these discussions is this:

1) desirability: We want a built-in way to handle sequences in logical
replication. I think everyone agrees this is not a way to do distributed
sequences in an active-active setups, but that there are other use cases
that need this feature - typically upgrades / logical failover.

Multiple approaches were discussed (support in logical replication or a
separate tool to be executed on the logical replica). Both might work,
people usually end up with some sort of custom tool anyway. But it's
cumbersome, and the consensus seems the logical rep feature is better.

2) performance: There was concern about the performance impact, and that
it affects everyone, including those who don't replicate sequences (as
the overhead is mostly incurred before calls to output plugin etc.).

I do agree with this, but I don't think sequences can be decoded in a
much cheaper way. There was a proposal [2] that maybe we could batch the
non-transactional sequences changes in the "next" transaction, and
distribute them similarly to SnapBuildDistributeNewCatalogSnapshot()
distributes catalog snapshots.

But I doubt that'd actually work. Or more precisely - if we can make the
code work, I think it would not solve the issue for some common cases.
Consider for example a case with many concurrent top-level transactions,
making this quite expensive. And I'd bet sequence changes are far more
common than catalog changes.

However, I think we ultimately agreed that the overhead is acceptable if
it only applies to use cases that actually need to decode sequences. So
if there was a way to skip sequence decoding when not necessary, that
would work. Unfortunately, that can't be based on simply checking which
callbacks are defined by the output plugin, because e.g. pgoutput needs
to handle both cases (so the callbacks need to be defined). Nor it can
be determined based on what's included in the publication (as that's not
available that early).

The agreement was that the best way is to have a CREATE SUBSCRIPTION
option that would instruct the upstream to decode sequences. By default
this option is 'off' (because that's the no-overhead case), but it can
be enabled for each subscription.

This is what 0005 implements, and interestingly enough, this is what an
earlier version [3] from 2023/04/02 did.

This means that if you add a sequence to the publication, but leave
"sequences=off" in CREATE SUBSCRIPTION, the sequence won't be replicated
after all. That may seems a bit surprising, and I don't like it, but I
don't think there's a better way to do this.

3) correctness: The last point is about making "transactional" flag
correct when the snapshot state changes mid-transaction, originally
pointed out by Dilip [4]. Per [5] this however happens to work
correctly, because while we identify the change as 'non-transactional'
(which is incorrect), we immediately throw it again (so we don't try to
apply it, which would error-out).

One option would be to document/describe this in the comments, per 0006.
This means that when ReorderBufferSequenceIsTransactional() returns
true, it's correct. But if it returns 'false', it means 'maybe'. I agree
it seems a bit strange, but with the extra comments I think it's OK. It
simply means that if we get transactional=false incorrectly, we're
guaranteed to not process it. Maybe we could rename the function to make
this clear from the name.

The other solution proposed in the thread [6] was to always decode the
relfilenode, and add it to the hash table. 0007 does this, and it works.
But I agree this seems possibly worse than 0006 - it means we may be
adding entries to the hash table, and it's not clear when exactly we'll
clean them up etc. It'd be the only place processing stuff before the
snapshots reaches FULL.

I personally would go with 0006, i.e. just explaining why doing it this
way is correct.

regards

[1]
https://www.postgresql.org/message-id/12822961-b7de-9d59-dd27-2e3dc3980c7e%40enterprisedb.com

[2]
https://www.postgresql.org/message-id/CAFiTN-vm3-bGfm-uJdzRLERMHozW8xjZHu4rdmtWR-rP-SJYMQ%40mail.gmail.com

[3]
https://www.postgresql.org/message-id/1f96b282-cb90-8302-cee8-7b3f5576a31c%40enterprisedb.com

[4]
https://www.postgresql.org/message-id/CAFiTN-vAx-Y%2B19ROKOcWnGf7ix2VOTUebpzteaGw9XQyCAeK6g%40mail.gmail.com

[5]
https://www.postgresql.org/message-id/CAA4eK1LFise9iN%2BNN%3Dagrk4prR1qD%2BebvzNjKAWUog2%2Bhy3HxQ%40mail.gmail.com

[6]
https://www.postgresql.org/message-id/CAFiTN-sYpyUBabxopJysqH3DAp4OZUCTi6m_qtgt8d32vDcWSA%40mail.gmail.com

--
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v20240111-0001-Logical-decoding-of-sequences.patch text/x-patch 69.0 KB
v20240111-0002-Add-decoding-of-sequences-to-test_decoding.patch text/x-patch 20.4 KB
v20240111-0003-Add-decoding-of-sequences-to-built-in-repl.patch text/x-patch 269.4 KB
v20240111-0004-global-hash-table-of-sequence-relfilenodes.patch text/x-patch 12.1 KB
v20240111-0005-CREATE-SUBSCRIPTION-flag-to-enable-sequenc.patch text/x-patch 19.0 KB
v20240111-0006-add-comment-about-SNAPBUILD_FULL.patch text/x-patch 1.9 KB
v20240111-0007-decode-all-sequence-relfilenodes.patch text/x-patch 4.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2024-01-11 16:33:29 Re: A failure in t/038_save_logical_slots_shutdown.pl
Previous Message Jelte Fennema-Nio 2024-01-11 16:20:25 Re: Add new protocol message to change GUCs for usage with future protocol-only GUCs