Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
Date: 2022-03-03 06:37:05
Message-ID: CAA4eK1+V51Rcy1Pg0u2srqKvG1u_6RUz4=MuC0Wt3+smoObn9w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> I've attached updated patches.
>

The first patch LGTM. Some comments on the second patch:

1.
@@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg)
myslotname = MemoryContextStrdup(ApplyContext, syncslotname);

pfree(syncslotname);
+
+ /*
+ * Allocate the origin name in long-lived context for error context
+ * message
+ */
+ ReplicationOriginNameForTablesync(MySubscription->oid,
+ MyLogicalRepWorker->relid,
+ originname,
+ sizeof(originname));
+ apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
+ originname);

Can we assign this in LogicalRepSyncTableStart() where we already
forming the origin name? That will avoid this extra call to
ReplicationOriginNameForTablesync.

2.
@@ -3657,30 +3679,37 @@ apply_error_callback(void *arg)
errcontext("processing remote data during \"%s\"",
logicalrep_message_type(errarg->command));
else
- errcontext("processing remote data during \"%s\" in transaction %u",
+ errcontext("processing remote data during \"%s\" in transaction %u
committed at LSN %X/%X from replication origin \"%s\"",
logicalrep_message_type(errarg->command),
- errarg->remote_xid);
+ errarg->remote_xid,
+ LSN_FORMAT_ARGS(errarg->commit_lsn),
+ errarg->origin_name);

Won't we set the origin name before the command? If so, it can be used
in the first message as well and we can change the condition in the
beginning such that if the origin or command is not set then we can
return without adding additional context information.

Isn't this error message used for rollback prepared failure as well?
If so, do we need to say "... finished at LSN ..." instead of "...
committed at LSN ..."?

The other point about this message is that saying ".... from
replication origin ..." sounds more like remote information similar to
publication but the origin is of the local node. How about slightly
changing it to "processing remote data for replication origin \"%s\"
during \"%s\" in transaction ..."?

3.
+</screen>
+ The LSN of the transaction that contains the change violating the
constraint and
+ the replication origin name can be found from those outputs (LSN
0/14C0378 and
+ replication origin <literal>pg_16395</literal> in the above case).
The transaction
+ can be skipped by calling the <link linkend="pg-replication-origin-advance">
<function>pg_replication_origin_advance()</function></link> function with
- a <parameter>node_name</parameter> corresponding to the subscription name,
- and a position. The current position of origins can be seen in the
- <link linkend="view-pg-replication-origin-status">
+ the <parameter>node_name</parameter> and the next LSN of the commit LSN
+ (i.e., 0/14C0379) from those outputs.

After node_name, can we specify origin_name similar to what we do for
LSN as that will make things more clear? Also, shall we mention that
users need to disable subscriptions to perform replication origin
advance?

I think for prepared transactions, the user might need to use it twice
because after skipping prepared xact, it will get an error again
during the processing of commit prepared (no such prepare exists). I
thought of mentioning it but felt that might lead to specifying too
many details which can confuse users as well. What do you think?

4. There are places in the patch like apply_handle_stream_start()
which sets commit_lsn in callback arg as InvalidXLogRecPtr but the
callback function seems to be assuming that it is always a valid
value. Shouldn't we need to avoid appending LSN for such cases?

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Westermann (DWE) 2022-03-03 06:55:43 Re: Changing "Hot Standby" to "hot standby"
Previous Message Michael Paquier 2022-03-03 06:35:02 Re: Proposal: Support custom authentication methods using hooks