RE: Synchronizing slots from primary to standby

From: "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
To: 'shveta malik' <shveta(dot)malik(at)gmail(dot)com>
Cc: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Drouvot, Bertrand" <bertranddrouvot(dot)pg(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, Ashutosh Sharma <ashu(dot)coek88(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Ajin Cherian <itsajin(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Peter Smith <smithpb2250(at)gmail(dot)com>
Subject: RE: Synchronizing slots from primary to standby
Date: 2023-10-02 10:59:06
Message-ID: TYAPR01MB5866A953535944E6FE6A86AEF5C5A@TYAPR01MB5866.jpnprd01.prod.outlook.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Dear Shveta,

Thank you for updating the patch!

I found another ERROR due to the slot removal. Is this a real issue?

1. applied add_sleep.txt, which emulated the case the tablesync worker stucked
and the primary crashed during the
initial sync.
2. executed test_0925_v2.sh (You attached in [1])
3. secondary could not start the logical replication because the slot was not
created (log files were also attached).

Here is my analysis. The cause is that the slotsync worker aborts the slot creation
on secondary server because the restart_lsn of secondary ahead the primary's one.
IIUC it can be occurred when tablesync workers finishes initial copy before
walsenders stream changes. In this case, the relstate of the worker is set to
SUBREL_STATE_CATCHUP and the apply worker waits till the relation becomes
SUBREL_STATE_SYNCDONE. From here the slot on primary will not be updated until
the relation is caught up. If some changes are come and the primary crashes at
that time, the syncslot worker will abort the slot creation.

Anyway, followings are my comments.
I have not checked detailed conventions yet. It should be done in later stage.

~~~~~~~~~~~~~~~~

For 0001:
===

WalSndWaitForStandbyConfirmation()

```
+ /* If postmaster asked us to stop, don't wait anymore */
+ if (got_STOPPING)
+ break;
```

I have considered again, and it may still have an issue: logical walsenders may
break from the loop before physical walsenders send WALs. This may be occurred
because both physical and logical walsenders would get PROCSIG_WALSND_INIT_STOPPING.

I think a function like WalSndWaitStopping() must be needed, which waits until
physical walsenders become WALSNDSTATE_STOPPING or exit. Thought?

WalSndWaitForStandbyConfirmation()

```
+ standby_slot_cpy = list_copy(standby_slot_names_list);
```

I found that standby_slot_names_list and standby_slot_cpy would not be updated
even if the GUC was updated. Is this acceptable? Won't it be occurred after you
refactor the patch?
What would be occurred when synchronize_slot_names is updated on secondary
while primary executes this?

WalSndWaitForStandbyConfirmation()

```
+
+ goto retry;
```

I checked other "goto retry;", but I could not find the pattern that the return
clause does not exist after the goto (exception: void function). I also think
that current style seems a bit strange. How about using an outer loop like
While (list_length(standby_slot_cpy))?

=====

slot.h

```
+extern void WaitForStandbyLSN(XLogRecPtr wait_for_lsn);
```

WaitForStandbyLSN() does not exist.

~~~~~~~~~~~~~~~~

For 0002:
=====

General

The patch requires that primary_conninfo must contain the dbname, but it
conflicts with documentation. It says:

```
...Do not specify a database name in the primary_conninfo string.
```

I confirmed [^a] it is harmless that primary_conninfo has dbname, but at least
the description must be fixed.

General

I found that primary server output huge amount of logs when the log_min_duration_messages = 0.
This ie because slotsync worker sends an SQL per 10ms, in wait_for_primary_slot_catchup().
Is there any good way to suppress it? Or, should we be patient?

=====

```
+{ oid => '6312', descr => 'what caused the replication slot to become invalid',
```

How did you determine the oid? IIRC, developping features should use oids in
the range 8000-9999. See src/include/catalog/unused_oids.

=====

LogicalRepCtxStruct

```
/* Background workers. */
+ SlotSyncWorker *ss_workers; /* slot-sync workers */
LogicalRepWorker workers[FLEXIBLE_ARRAY_MEMBER];
```

It's OK for now, but can we combine them into an array? IIUC there is no
possibility to exist both of processes and they have same component, so it may
be able to be same. It can reduce an attribute but may lead some
difficulties to read.

WaitForReplicationWorkerAttach() and logicalrep_worker_stop_internal()

I could not find cases that has "LWLock *" as an argument (exception: functions in lwlock.c).
Is it sufficient to check RecoveryInProgress() instead of specifying as arguments?

=====

wait_for_primary_slot_catchup()

```
+ /* Check if this standby is promoted while we are waiting */
+ if (!RecoveryInProgress())
+ {
+ /*
+ * The remote slot didn't pass the locally reserved position at
+ * the time of local promotion, so it's not safe to use.
+ */
+ ereport(
+ WARNING,
+ (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE),
+ errmsg(
+ "slot-sync wait for slot %s interrupted by promotion, "
+ "slot creation aborted", remote_slot->name)));
+ pfree(cmd.data);
+ return false;
+ }
```

The part would not be executed if the promote signal is sent after the primary
server crashes. I think walrcv_exec() will detect the failure first.
The function must be wrapped by PG_TRY() and the message must be emitted in
PG_CATCH(). There may be other approaches.

wait_for_primary_slot_catchup()

```
+ rc = WaitLatch(MyLatch,
+ WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH,
+ WORKER_DEFAULT_NAPTIME_MS,
+ WAIT_EVENT_REPL_SLOTSYNC_MAIN);
```

New wait event can be added.

[1]: https://www.postgresql.org/message-id/CAJpy0uDD%2B9aJnDx9fBfvLvxJtxA7qqoAys4fo6h1tq1b_0_A7Q%40mail.gmail.com
[^a]

Regarding the secondary side, the libpqrcv_connect() does not do special things
even if the primary_conninfo has dbname="XXX". It adds parameters like
"replication=true" and sends a startup packet.

As for the primary side, the startup packet is consumed in ProcessStartupPacket().
It checks whether the process should be a walsender or not (line 2204).

Then (line 2290) the port->database_name[0] is set as '\0' in case of walsender.
The value is used for setting the process title in BackendInitialize().

Also, InitPostgres() really sets some global variables like MyDatabaseId,
but it is not occurred when the process is walsender.

Best Regards,
Hayato Kuroda
FUJITSU LIMITED

Attachment Content-Type Size
add_sleep.txt text/plain 654 bytes
primary.log application/octet-stream 226.5 KB
secondary.log application/octet-stream 6.2 KB
subscriber.log application/octet-stream 4.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dagfinn Ilmari Mannsåker 2023-10-02 11:44:59 Re: Replace (stat(<file>))[7] in TAP tests with -s
Previous Message Dean Rasheed 2023-10-02 10:49:40 Re: Making the subquery alias optional in the FROM clause