Re: Handle infinite recursion in logical replication setup

From: Peter Smith <smithpb2250(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Handle infinite recursion in logical replication setup
Date: 2022-04-12 04:56:12
Message-ID: CAHut+Ptx2gPMmNMdiq+izLhZcv66t80r6n91KXegVc1cgC2qGQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 7, 2022 at 2:09 PM Peter Smith <smithpb2250(at)gmail(dot)com> wrote:
>
> FYI, here is a test script that is using the current patch (v6) to
> demonstrate a way to share table data between different numbers of
> nodes (up to 5 of them here).
>
> The script starts off with just 2-way sharing (nodes N1, N2),
> then expands to 3-way sharing (nodes N1, N2, N3),
> then 4-way sharing (nodes N1, N2, N3, N4),
> then 5-way sharing (nodes N1, N2, N3, N4, N5).
>
> As an extra complication, for this test, all 5 nodes have different
> initial table data, which gets replicated to the others whenever each
> new node joins the existing share group.
>
> PSA.
>

Hi Vignesh. I had some problems getting the above test script working.
It was OK up until I tried to join the 5th node (N5) to the existing 4
nodes. The ERROR was manifesting itself strangely because it appeared
that there was an index violation in the pg_subscription_rel catalog
even though AFAIK the N5 did not have any entries in it.

e.g.
2022-04-07 09:13:28.361 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.361 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16393) already exists.
2022-04-07 09:13:28.361 AEST [24237] STATEMENT: create subscription
sub51 connection 'port=7651' publication pub1 with
(subscribe_local_only=true,copy_data=force);
2022-04-07 09:13:28.380 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.380 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16394) already exists.
2022-04-07 09:13:28.380 AEST [24237] STATEMENT: create subscription
sub52 connection 'port=7652' publication pub2 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:13:28.405 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.405 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16395) already exists.
2022-04-07 09:13:28.405 AEST [24237] STATEMENT: create subscription
sub53 connection 'port=7653' publication pub3 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:13:28.425 AEST [24237] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:13:28.425 AEST [24237] DETAIL: Key (srrelid,
srsubid)=(16384, 16396) already exists.
2022-04-07 09:13:28.425 AEST [24237] STATEMENT: create subscription
sub54 connection 'port=7654' publication pub4 with
(subscribe_local_only=true,copy_data=false);
2022-04-07 09:17:52.472 AEST [25852] ERROR: duplicate key value
violates unique constraint "pg_subscription_rel_srrelid_srsubid_index"
2022-04-07 09:17:52.472 AEST [25852] DETAIL: Key (srrelid,
srsubid)=(16384, 16397) already exists.
2022-04-07 09:17:52.472 AEST [25852] STATEMENT: create subscription
sub51 connection 'port=7651' publication pub1;

~~~

When I debugged this it seemed like each of the CREAT SUBSCRIPTION was
trying to make a double-entry, because the fetch_tables (your patch
v6-0002 modified SQL of this) was retuning the same table 2x.

(gdb) bt
#0 errfinish (filename=0xbc1057 "nbtinsert.c", lineno=671,
funcname=0xbc25e0 <__func__.15798> "_bt_check_unique") at elog.c:510
#1 0x0000000000526d83 in _bt_check_unique (rel=0x7f654219c2a0,
insertstate=0x7ffd9629ddd0, heapRel=0x7f65421b0e28,
checkUnique=UNIQUE_CHECK_YES, is_unique=0x7ffd9629de01,
speculativeToken=0x7ffd9629ddcc) at nbtinsert.c:664
#2 0x0000000000526157 in _bt_doinsert (rel=0x7f654219c2a0,
itup=0x19ea8e8, checkUnique=UNIQUE_CHECK_YES, indexUnchanged=false,
heapRel=0x7f65421b0e28) at nbtinsert.c:208
#3 0x000000000053450e in btinsert (rel=0x7f654219c2a0,
values=0x7ffd9629df10, isnull=0x7ffd9629def0, ht_ctid=0x19ea894,
heapRel=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES,
indexUnchanged=false, indexInfo=0x19dea80) at nbtree.c:201
#4 0x00000000005213b6 in index_insert (indexRelation=0x7f654219c2a0,
values=0x7ffd9629df10, isnull=0x7ffd9629def0, heap_t_ctid=0x19ea894,
heapRelation=0x7f65421b0e28, checkUnique=UNIQUE_CHECK_YES,
indexUnchanged=false, indexInfo=0x19dea80) at indexam.c:193
#5 0x00000000005c81d5 in CatalogIndexInsert (indstate=0x19de540,
heapTuple=0x19ea890) at indexing.c:158
#6 0x00000000005c8325 in CatalogTupleInsert (heapRel=0x7f65421b0e28,
tup=0x19ea890) at indexing.c:231
#7 0x00000000005f0170 in AddSubscriptionRelState (subid=16400,
relid=16384, state=105 'i', sublsn=0) at pg_subscription.c:315
#8 0x00000000006d6fa5 in CreateSubscription (pstate=0x1942dc0,
stmt=0x191f6a0, isTopLevel=true) at subscriptioncmds.c:767

~~

Aside: All this was happening when I did not have enough logical
replication workers configured. (There were WARNINGS in the logfile
that I had not noticed).
When I fix the configuration then all these other problems went away!

~~

So to summarize, I'm not sure if the fetch_tables still has some
potential problem lurking or not, but I feel that the SQL in that
function maybe needs a closer look to ensure it is always impossible
to return the same table multiple times.

------
Kind Regards,
Peter Smith.
Fujitsu Australia.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2022-04-12 05:00:33 Re: Schema variables - new implementation for Postgres 15+1
Previous Message Amit Kapila 2022-04-12 03:56:12 Re: pg_walinspect - a new extension to get raw WAL data and WAL stats