From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
Cc: | Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Add the replication origin name and commit-LSN to logical replication worker errcontext |
Date: | 2022-03-03 13:02:49 |
Message-ID: | CAD21AoAz76QEWhdbC45qDLXJscejZa2BOSJUKGnXXo6W2mPyQA@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Thu, Mar 3, 2022 at 3:37 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Wed, Mar 2, 2022 at 1:05 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Wed, Mar 2, 2022 at 4:14 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > I've attached updated patches.
> >
>
> The first patch LGTM. Some comments on the second patch:
>
> 1.
> @@ -3499,6 +3503,17 @@ ApplyWorkerMain(Datum main_arg)
> myslotname = MemoryContextStrdup(ApplyContext, syncslotname);
>
> pfree(syncslotname);
> +
> + /*
> + * Allocate the origin name in long-lived context for error context
> + * message
> + */
> + ReplicationOriginNameForTablesync(MySubscription->oid,
> + MyLogicalRepWorker->relid,
> + originname,
> + sizeof(originname));
> + apply_error_callback_arg.origin_name = MemoryContextStrdup(ApplyContext,
> + originname);
>
> Can we assign this in LogicalRepSyncTableStart() where we already
> forming the origin name? That will avoid this extra call to
> ReplicationOriginNameForTablesync.
Yes, but it requires to expose either apply_error_callback_arg or the
tablesync's origin name since the tablesync worker sets its origin
name in tablesync.c. I think it's better to avoid exposing them.
>
> 2.
> @@ -3657,30 +3679,37 @@ apply_error_callback(void *arg)
> errcontext("processing remote data during \"%s\"",
> logicalrep_message_type(errarg->command));
> else
> - errcontext("processing remote data during \"%s\" in transaction %u",
> + errcontext("processing remote data during \"%s\" in transaction %u
> committed at LSN %X/%X from replication origin \"%s\"",
> logicalrep_message_type(errarg->command),
> - errarg->remote_xid);
> + errarg->remote_xid,
> + LSN_FORMAT_ARGS(errarg->commit_lsn),
> + errarg->origin_name);
>
> Won't we set the origin name before the command? If so, it can be used
> in the first message as well and we can change the condition in the
> beginning such that if the origin or command is not set then we can
> return without adding additional context information.
Good point.
>
> Isn't this error message used for rollback prepared failure as well?
> If so, do we need to say "... finished at LSN ..." instead of "...
> committed at LSN ..."?
Right. Or can we just remove "committed" since the current message is
"transaction %u at %s"? That is , just replace the timestamp with LSN.
>
> The other point about this message is that saying ".... from
> replication origin ..." sounds more like remote information similar to
> publication but the origin is of the local node. How about slightly
> changing it to "processing remote data for replication origin \"%s\"
> during \"%s\" in transaction ..."?
Okay, so the modified message would be like:
"processing remote data for replication origin \"%s\" during \"%s\"
for replication target relation \"%s.%s\" column \"%s\" in transaction
%u finished at LSN %X/%X"
>
> 3.
> +</screen>
> + The LSN of the transaction that contains the change violating the
> constraint and
> + the replication origin name can be found from those outputs (LSN
> 0/14C0378 and
> + replication origin <literal>pg_16395</literal> in the above case).
> The transaction
> + can be skipped by calling the <link linkend="pg-replication-origin-advance">
> <function>pg_replication_origin_advance()</function></link> function with
> - a <parameter>node_name</parameter> corresponding to the subscription name,
> - and a position. The current position of origins can be seen in the
> - <link linkend="view-pg-replication-origin-status">
> + the <parameter>node_name</parameter> and the next LSN of the commit LSN
> + (i.e., 0/14C0379) from those outputs.
>
> After node_name, can we specify origin_name similar to what we do for
> LSN as that will make things more clear? Also, shall we mention that
> users need to disable subscriptions to perform replication origin
> advance?
Agreed.
>
> I think for prepared transactions, the user might need to use it twice
> because after skipping prepared xact, it will get an error again
> during the processing of commit prepared (no such prepare exists).
Good point.
> I
> thought of mentioning it but felt that might lead to specifying too
> many details which can confuse users as well. What do you think?
Given that this method of using pg_replication_origin_advance() is
normally not a preferable way, I think we might not need to mention
it. Also, it needs twice for one transaction but the steps are the
same.
>
> 4. There are places in the patch like apply_handle_stream_start()
> which sets commit_lsn in callback arg as InvalidXLogRecPtr but the
> callback function seems to be assuming that it is always a valid
> value. Shouldn't we need to avoid appending LSN for such cases?
Agreed. Will fix it in the next version patch.
I'm updating the patches and will submit them.
Regards,
--
Masahiko Sawada
EDB: https://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Andrew Dunstan | 2022-03-03 13:16:13 | Re: Rewriting the test of pg_upgrade as a TAP test - take three - remastered set |
Previous Message | Aleksander Alekseev | 2022-03-03 13:00:06 | Re: Problem with moderation of messages with patched attached. |