RE: Skipping logical replication transactions on subscriber side

From: "houzj(dot)fnst(at)fujitsu(dot)com" <houzj(dot)fnst(at)fujitsu(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Greg Nancarrow <gregn4422(at)gmail(dot)com>, "tanghy(dot)fnst(at)fujitsu(dot)com" <tanghy(dot)fnst(at)fujitsu(dot)com>, "osumi(dot)takamichi(at)fujitsu(dot)com" <osumi(dot)takamichi(at)fujitsu(dot)com>, Alexey Lesovsky <lesovsky(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: RE: Skipping logical replication transactions on subscriber side
Date: 2021-12-01 03:39:20
Message-ID: OS0PR01MB57167628015DE3390C9AF6CE94689@OS0PR01MB5716.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 1, 2021 11:22 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> On Wed, Dec 1, 2021 at 8:24 AM houzj(dot)fnst(at)fujitsu(dot)com
> <houzj(dot)fnst(at)fujitsu(dot)com> wrote:
> >
> > On Tues, Nov 30, 2021 9:39 PM Masahiko Sawada
> <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > > Shouldn't we someway check that the error message also starts with
> > > > > "duplicate key value violates ..."?
> > > >
> > > > Yeah, I think it's a good idea to make the checks more specific. That
> > > > is, probably we can specify the prefix of the error message and
> > > > subrelid in addition to the current conditions: relid and xid. That
> > > > way, we can check what error was reported by which workers (tablesync
> > > > or apply) for which relations. And both check queries in
> > > > test_subscription_error() can have the same WHERE clause.
> > >
> > > I've attached a patch that fixes this issue. Please review it.
> > >
> >
> > I have a question about the testcase (I could be wrong here).
> >
> > Is it possible that the race condition happen between apply
> worker(test_tab1)
> > and table sync worker(test_tab2) ? If so, it seems the error("replication
> > origin with OID") could happen randomly until we resolve the conflict.
> > Based on this, for the following code:
> > -----
> > # Wait for the error statistics to be updated.
> > my $check_sql = qq[SELECT count(1) > 0 ] . $part_sql;
> > $node->poll_query_until(
> > 'postgres', $check_sql,
> > ) or die "Timed out while waiting for statistics to be updated";
> >
> > * [1] *
> >
> > $check_sql =
> > qq[
> > SELECT subname, last_error_command, last_error_relid::regclass,
> > last_error_count > 0 ] . $part_sql;
> > my $result = $node->safe_psql('postgres', $check_sql);
> > is($result, $expected, $msg);
> > -----
> >
> > Is it possible that the error("replication origin with OID") happen again at the
> > place [1]. In this case, the error message we have checked could be replaced
> by
> > another error("replication origin ...") and then the test fail ?
> >
>
> Once we get the "duplicate key violation ..." error before * [1] * via
> apply_worker then we shouldn't get replication origin-specific error
> because the origin set up is done before starting to apply changes.
> Also, even if that or some other happens after * [1] * because of
> errmsg_prefix check it should still succeed. Does that make sense?

Oh, I missed the point that the origin set up is done once we get the expected error.
Thanks for the explanation, and I think the patch looks good.

Best regards,
Hou zj

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2021-12-01 03:41:31 Re: Skipping logical replication transactions on subscriber side
Previous Message Amit Kapila 2021-12-01 03:22:20 Re: Skipping logical replication transactions on subscriber side