Re: Skipping logical replication transactions on subscriber side

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-08-02 02:15:03
Message-ID: CAD21AoCO3TED2cSche4iGpYekpyE-rwzput0oDW3YNvoXNkrQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jul 30, 2021 at 12:52 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jul 29, 2021 at 11:18 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Thu, Jul 29, 2021 at 2:04 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > > Yeah, it seems to be introduced by commit 0926e96c493. I've attached
> > > > the patch for that.
> > > >
> > > > Also, I've attached the updated version patches. This version patch
> > > > has pg_stat_reset_subscription_error() SQL function and sends a clear
> > > > message after skipping the transaction. 0004 patch includes the
> > > > skipping transaction feature and introducing RESET to ALTER
> > > > SUBSCRIPTION. It would be better to separate them.
> > > >
>
> +1, to separate out the reset part.

Okay, I'll do that.

>
> > >
> > > I've attached the new version patches that fix cfbot failure.
> >
> > Sorry I've attached wrong ones. Reattached the correct version patches.
> >
>
> Pushed the 0001* patch that removes the unused parameter.

Thanks!

>
> Few comments on v4-0001-Add-errcontext-to-errors-of-the-applying-logical-
> ===========================================================

Thank you for the comments!

> 1.
> --- a/src/backend/commands/tablecmds.c
> +++ b/src/backend/commands/tablecmds.c
> @@ -78,6 +78,7 @@
> #include "partitioning/partbounds.h"
> #include "partitioning/partdesc.h"
> #include "pgstat.h"
> +#include "replication/logicalworker.h"
> #include "rewrite/rewriteDefine.h"
> #include "rewrite/rewriteHandler.h"
> #include "rewrite/rewriteManip.h"
> @@ -1899,6 +1900,9 @@ ExecuteTruncateGuts(List *explicit_rels,
> continue;
> }
>
> + /* Set logical replication error callback info if necessary */
> + set_logicalrep_error_context_rel(rel);
> +
> /*
> * Build the lists of foreign tables belonging to each foreign server
> * and pass each list to the foreign data wrapper's callback function,
> @@ -2006,6 +2010,9 @@ ExecuteTruncateGuts(List *explicit_rels,
> pgstat_count_truncate(rel);
> }
>
> + /* Reset logical replication error callback info */
> + reset_logicalrep_error_context_rel();
> +
>
> Setting up logical rep error context in a generic function looks a bit
> odd to me. Do we really need to set up error context here? I
> understand we can't do this in caller but anyway I think we are not
> sending this to logical replication view as well, so not sure we need
> to do it here.

Yeah, I'm not convinced of this part yet. I wanted to show relid also
in truncate cases but I came up with only this idea.

If an error happens during truncating the table (in
ExecuteTruncateGuts()), relid set by
set_logicalrep_error_context_rel() is actually sent to the view. If we
don’t have it, the view always shows relid as NULL in truncate cases.
On the other hand, it doesn’t cover all cases. For example, it doesn’t
cover an error that the target table doesn’t exist on the subscriber,
which happens when opening the target table. Anyway, in most cases,
even if relid is NULL, the error message in the view helps users to
know which relation the error happened on. What do you think?

>
> 2.
> +/* Struct for saving and restoring apply information */
> +typedef struct ApplyErrCallbackArg
> +{
> + LogicalRepMsgType command; /* 0 if invalid */
> +
> + /* Local relation information */
> + char *nspname; /* used for error context */
> + char *relname; /* used for error context */
> +
> + TransactionId remote_xid;
> + TimestampTz committs;
> +} ApplyErrCallbackArg;
> +static ApplyErrCallbackArg apply_error_callback_arg =
> +{
> + .command = 0,
> + .relname = NULL,
> + .nspname = NULL,
> + .remote_xid = InvalidTransactionId,
> + .committs = 0,
> +};
> +
>
> Better to have a space between the above two declarations.

Will fix.

>
> 3. commit message:
> This commit adds the error context to errors happening during applying
> logical replication changes, showing the command, the relation
> relation, transaction ID, and commit timestamp in the server log.
>
> 'relation' is mentioned twice.

Will fix.

>
> The patch is not getting applied probably due to yesterday's commit in
> this area.

Okay. I'll rebase the patches to the current HEAD.

I'm incorporating all comments from you and Houzj, and will submit the
new patch soon.

Regards,

--
Masahiko Sawada
EDB: https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-08-02 02:36:12 Re: Skipping logical replication transactions on subscriber side
Previous Message Tom Lane 2021-08-02 01:02:20 Re: [PATCH] proposal for regexp_count, regexp_instr, regexp_substr and regexp_replace