Re: logical decoding and replication of sequences, take 2

From: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>
To: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Subject: Re: logical decoding and replication of sequences, take 2
Date: 2023-07-20 07:24:21
Message-ID: CAExHW5t_sANOszFrE-Z=9Ya5Ob8W96e643UFZEX2A4kOzPpG5A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks Tomas for the updated patches.

Here are my comments on 0006 patch as well as 0002 patch.

On Wed, Jul 19, 2023 at 4:23 PM Tomas Vondra
<tomas(dot)vondra(at)enterprisedb(dot)com> wrote:
>
> I think this is an accurate description of what the current patch does.
> And I think it's a reasonable behavior.
>
> My point is that if this gets released in PG17, it'll be difficult to
> change, even if it does not change on-disk format.
>

Yes. I agree. And I don't see any problem even if we are not able to change it.

>
> I need to think behavior about this a bit more, and maybe check how
> difficult would be implementing it.

Ok.

In most of the comments and in documentation, there are some phrases
which do not look accurate.

Change to a sequence is being refered to as "sequence increment". While
ascending sequences are common, PostgreSQL supports descending sequences as
well. The changes there will be decrements. But that's not the only case. A
sequence may be restarted with an older value, in which case the change could
increment or a decrement. I think correct usage is 'changes to sequence' or
'sequence changes'.

Sequence being assigned a new relfilenode is referred to as sequence
being created. This is confusing. When an existing sequence is ALTERed, we
will not "create" a new sequence but we will "create" a new relfilenode and
"assign" it to that sequence.

PFA such edits in 0002 and 0006 patches. Let me know if those look
correct. I think we
need similar changes to the documentation and comments in other places.

>
> I did however look at the proposed alternative to the "created" flag.
> The attached 0006 part ditches the flag with XLOG_SMGR_CREATE decoding.
> The smgr_decode code needs a review (I'm not sure the
> skipping/fast-forwarding part is correct), but it seems to be working
> fine overall, although we need to ensure the WAL record has the correct XID.
>

Briefly describing the patch. When decoding a XLOG_SMGR_CREATE WAL
record, it adds the relfilenode mentioned in it to the sequences hash.
When decoding a sequence change record, it checks whether the
relfilenode in the WAL record is in hash table. If it is the sequence
changes is deemed transactional otherwise non-transactional. The
change looks good to me. It simplifies the logic to decide whether a
sequence change is transactional or not.

In sequence_decode() we skip sequence changes when fast forwarding.
Given that smgr_decode() is only to supplement sequence_decode(), I
think it's correct to do the same in smgr_decode() as well. Simillarly
skipping when we don't have full snapshot.

Some minor comments on 0006 patch

+ /* make sure the relfilenode creation is associated with the XID */
+ if (XLogLogicalInfoActive())
+ GetCurrentTransactionId();

I think this change is correct and is inline with similar changes in 0002. But
I looked at other places from where DefineRelation() is called. For regular
tables it is called from ProcessUtilitySlow() which in turn does not call
GetCurrentTransactionId(). I am wondering whether we are just discovering a
class of bugs caused by not associating an xid with a newly created
relfilenode.

+ /*
+ * If we don't have snapshot or we are just fast-forwarding, there is no
+ * point in decoding changes.
+ */
+ if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT ||
+ ctx->fast_forward)
+ return;

This code block is repeated.

+void
+ReorderBufferAddRelFileLocator(ReorderBuffer *rb, TransactionId xid,
+ RelFileLocator rlocator)
+{
... snip ...
+
+ /* sequence changes require a transaction */
+ if (xid == InvalidTransactionId)
+ return;

IIUC, with your changes in DefineSequence() in this patch, this should not
happen. So this condition will never be true. But in case it happens, this code
will not add the relfilelocation to the hash table and we will deem the
sequence change as non-transactional. Isn't it better to just throw an error
and stop replication if that (ever) happens?

Also some comments on 0002 patch

@@ -405,8 +405,19 @@ fill_seq_fork_with_data(Relation rel, HeapTuple
tuple, ForkNumber forkNum)

/* check the comment above nextval_internal()'s equivalent call. */
if (RelationNeedsWAL(rel))
+ {
GetTopTransactionId();

+ /*
+ * Make sure the subtransaction has a XID assigned, so that
the sequence
+ * increment WAL record is properly associated with it. This
matters for
+ * increments of sequences created/altered in the
transaction, which are
+ * handled as transactional.
+ */
+ if (XLogLogicalInfoActive())
+ GetCurrentTransactionId();
+ }
+

I think we should separately commit the changes which add a call to
GetCurrentTransactionId(). That looks like an existing bug/anomaly
which can stay irrespective of this patch.

+ /*
+ * To support logical decoding of sequences, we require the sequence
+ * callback. We decide it here, but only check it later in the wrappers.
+ *
+ * XXX Isn't it wrong to define only one of those callbacks? Say we
+ * only define the stream_sequence_cb() - that may get strange results
+ * depending on what gets streamed. Either none or both?
+ *
+ * XXX Shouldn't sequence be defined at slot creation time, similar
+ * to two_phase? Probably not.
+ */

Do you intend to keep these XXX's as is? My previous comments on this comment
block are in [1].

In fact, given that whether or not sequences are replicated is decided by the
protocol version, do we really need LogicalDecodingContext::sequences? Drawing
parallel with WAL messages, I don't think it's needed.

[1] https://www.postgresql.org/message-id/CAExHW5vScYKKb0RZoiNEPfbaQ60hihfuWeLuZF4JKrwPJXPcUw%40mail.gmail.com

--
Best Wishes,
Ashutosh Bapat

Attachment Content-Type Size
0007-Edits-on-0002-patch-20230720.patch text/x-patch 6.7 KB
0008-Edit-on-patch-0006-20230720.patch text/x-patch 2.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bharath Rupireddy 2023-07-20 07:28:52 Re: Atomic ops for unlogged LSN
Previous Message Zhang Mingli 2023-07-20 07:06:32 Re: [feature]COPY FROM enable FORCE_NULL/FORCE_NOT_NULL on all columns