RE: Synchronizing slots from primary to standby

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'Ajin Cherian' <itsajin(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Peter Smith <smithpb2250(at)gmail(dot)com>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-09-25 14:15:59
Message-ID: TYAPR01MB5866F0B657A7F4DC9D999C7AF5FCA@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Ajin, Shveta,

Thank you for rebasing the patch set! Here are new comments for v19_2-0001.

01. WalSndWaitForStandbyNeeded()

```
if (SlotIsPhysical(MyReplicationSlot))
return false;
```

Is there a possibility that physical walsenders call this function?
IIUC following is a stacktrace for the function, so the only logical walsenders use it.
If so, it should be Assert() instead of an if statement.

logical_read_xlog_page()
WalSndWaitForWal()
WalSndWaitForStandbyNeeded()

02. WalSndWaitForStandbyNeeded()

Can we set shouldwait in SlotSyncInitConfig()? synchronize_slot_names_list is
searched whenever the function is called, but it is not changed automatically.
If the slotname is compared with the list in the SlotSyncInitConfig(), the
liner search can be reduced.

03. WalSndWaitForStandbyConfirmation()

We should add ProcessRepliesIfAny() during the loop, otherwise the walsender
overlooks the death of an apply worker.

04. WalSndWaitForStandbyConfirmation()

Not sure, but do we have to return early if walsenders got PROCSIG_WALSND_INIT_STOPPING
signal? I thought that if physical walsenders get stuck, logical walsenders wait
forever. At that time we cannot stop the primary server even if "pg_ctl stop"
is executed.

05. SlotSyncInitConfig()

Why don't we free the memory for rawname, old standby_slot_names_list, and synchronize_slot_names_list?
They seem to be overwritten.

06. SlotSyncInitConfig()

Both physical and logical walsenders call the func, but physical one do not use
lists, right? If so, can we add a quick exit for physical walsenders?
Or, we should carefully remove where physical calls it.

07. StartReplication()

I think we do not have to call SlotSyncInitConfig().
Alternative approach is written in above.

08. the other

Also, I found the unexpected behavior after both 0001 and 0002 were applied.
Was it normal or not?

1. constructed below setup
(ensured that logical slot existed on secondary)
2. stopped the primary
3. promoted the secondary server
4. disabled a subscription once
5. changed the connection string for subscriber
6. Inserted data to new primary
7. enabled the subscription again
8. got an ERROR: replication slot "sub" does not exist

I expected that the logical replication would be restarted, but it could not.
Was it real issue or my fault? The error would appear in secondary.log.

```
Setup:
primary--->secondary
|
|
subscriber
```

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
test_0925.sh application/octet-stream 2.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2023-09-25 14:18:20 Re: bug fix and documentation improvement about vacuumdb
Previous Message Ranier Vilela 2023-09-25 13:43:49 Re: Avoid a possible out-of-bounds access (src/backend/optimizer/util/relnode.c)