Re: Skipping logical replication transactions on subscriber side

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Skipping logical replication transactions on subscriber side
Date: 2021-12-01 04:00:18
Message-ID: CAA4eK1JOsqPHZTOeifp+Q8pWgrn1qzks3ojZnSHp-1QZFcjyQQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 1, 2021 at 9:12 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Wed, Dec 1, 2021 at 12:22 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Wed, Dec 1, 2021 at 8:24 AM houzj(dot)fnst(at)fujitsu(dot)com
> > <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> > >
> > > I have a question about the testcase (I could be wrong here).
> > >
> > > Is it possible that the race condition happen between apply worker(test_tab1)
> > > and table sync worker(test_tab2) ? If so, it seems the error("replication
> > > origin with OID") could happen randomly until we resolve the conflict.
> > > Based on this, for the following code:
> > > -----
> > > # Wait for the error statistics to be updated.
> > > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > > $node->poll_query_until(
> > > 'postgres', $check_sql,
> > > ) or die "Timed out while waiting for statistics to be updated";
> > >
> > > * [1] *
> > >
> > > $check_sql =
> > > qq[
> > > SELECT subname, last_error_command, last_error_relid::regclass,
> > > last_error_count > 0 ] . $part_sql;
> > > my $result = $node->safe_psql('postgres', $check_sql);
> > > is($result, $expected, $msg);
> > > -----
> > >
> > > Is it possible that the error("replication origin with OID") happen again at the
> > > place [1]. In this case, the error message we have checked could be replaced by
> > > another error("replication origin ...") and then the test fail ?
> > >
> >
> > Once we get the "duplicate key violation ..." error before * [1] * via
> > apply_worker then we shouldn't get replication origin-specific error
> > because the origin set up is done before starting to apply changes.
>
> Right.
>
> > Also, even if that or some other happens after * [1] * because of
> > errmsg_prefix check it should still succeed.
>
> In this case, the old error ("duplicate key violation ...") is
> overwritten by a new error (e.g., connection error. not sure how
> possible it is)
>

Yeah, or probably some memory allocation failure. I think the
probability of such failures is very low but OTOH why take chance.

> and the test fails because the query returns no
> entries, no?
>

Right.

> If so, the result from the second check_sql is unstable
> and it's probably better to check the result only once. That is, the
> first check_sql includes the command and we exit from the function
> once we confirm the error entry is expectedly updated.
>

Yeah, I think that should be fine.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amul Sul 2021-12-01 04:59:15 Re: [Patch] ALTER SYSTEM READ ONLY
Previous Message Masahiko Sawada 2021-12-01 03:41:31 Re: Skipping logical replication transactions on subscriber side