Re: Add the replication origin name and commit-LSN to logical replication worker errcontext

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
Date: 2022-03-03 13:02:49
Message-ID: CAD21AoAz76QEWhdbC45qDLXJscejZa2BOSJUKGnXXo6W2mPyQA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 3, 2022 at 3:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > I've attached updated patches.
> >
>
> The first patch LGTM. Some comments on the second patch:
>
> 1.
> @@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg)
> myslotname = MemoryContextStrdup(ApplyContext, syncslotname);
>
> pfree(syncslotname);
> +
> + /*
> + * Allocate the origin name in long-lived context for error context
> + * message
> + */
> + ReplicationOriginNameForTablesync(MySubscription->oid,
> + MyLogicalRepWorker->relid,
> + originname,
> + sizeof(originname));
> + apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
> + originname);
>
> Can we assign this in LogicalRepSyncTableStart() where we already
> forming the origin name? That will avoid this extra call to
> ReplicationOriginNameForTablesync.

Yes, but it requires to expose either apply_error_callback_arg or the
tablesync's origin name since the tablesync worker sets its origin
name in tablesync.c. I think it's better to avoid exposing them.

>
> 2.
> @@ -3657,30 +3679,37 @@ apply_error_callback(void *arg)
> errcontext("processing remote data during \"%s\"",
> logicalrep_message_type(errarg->command));
> else
> - errcontext("processing remote data during \"%s\" in transaction %u",
> + errcontext("processing remote data during \"%s\" in transaction %u
> committed at LSN %X/%X from replication origin \"%s\"",
> logicalrep_message_type(errarg->command),
> - errarg->remote_xid);
> + errarg->remote_xid,
> + LSN_FORMAT_ARGS(errarg->commit_lsn),
> + errarg->origin_name);
>
> Won't we set the origin name before the command? If so, it can be used
> in the first message as well and we can change the condition in the
> beginning such that if the origin or command is not set then we can
> return without adding additional context information.

Good point.

>
> Isn't this error message used for rollback prepared failure as well?
> If so, do we need to say "... finished at LSN ..." instead of "...
> committed at LSN ..."?

Right. Or can we just remove "committed" since the current message is
"transaction %u at %s"? That is , just replace the timestamp with LSN.

>
> The other point about this message is that saying ".... from
> replication origin ..." sounds more like remote information similar to
> publication but the origin is of the local node. How about slightly
> changing it to "processing remote data for replication origin \"%s\"
> during \"%s\" in transaction ..."?

Okay, so the modified message would be like:

"processing remote data for replication origin \"%s\" during \"%s\"
for replication target relation \"%s.%s\" column \"%s\" in transaction
%u finished at LSN %X/%X"

>
> 3.
> +</screen>
> + The LSN of the transaction that contains the change violating the
> constraint and
> + the replication origin name can be found from those outputs (LSN
> 0/14C0378 and
> + replication origin <literal>pg_16395</literal> in the above case).
> The transaction
> + can be skipped by calling the <link linkend="pg-replication-origin-advance">
> <function>pg_replication_origin_advance()</function></link> function with
> - a <parameter>node_name</parameter> corresponding to the subscription name,
> - and a position. The current position of origins can be seen in the
> - <link linkend="view-pg-replication-origin-status">
> + the <parameter>node_name</parameter> and the next LSN of the commit LSN
> + (i.e., 0/14C0379) from those outputs.
>
> After node_name, can we specify origin_name similar to what we do for
> LSN as that will make things more clear? Also, shall we mention that
> users need to disable subscriptions to perform replication origin
> advance?

Agreed.

>
> I think for prepared transactions, the user might need to use it twice
> because after skipping prepared xact, it will get an error again
> during the processing of commit prepared (no such prepare exists).

Good point.

> I
> thought of mentioning it but felt that might lead to specifying too
> many details which can confuse users as well. What do you think?

Given that this method of using pg_replication_origin_advance() is
normally not a preferable way, I think we might not need to mention
it. Also, it needs twice for one transaction but the steps are the
same.

>
> 4. There are places in the patch like apply_handle_stream_start()
> which sets commit_lsn in callback arg as InvalidXLogRecPtr but the
> callback function seems to be assuming that it is always a valid
> value. Shouldn't we need to avoid appending LSN for such cases?

Agreed. Will fix it in the next version patch.

I'm updating the patches and will submit them.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-03-03 13:16:13 Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set
Previous Message Aleksander Alekseev 2022-03-03 13:00:06 Re: Problem with moderation of messages with patched attached.