From: | "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com> |
---|---|
To: | 'Masahiko Sawada' <sawada(dot)mshk(at)gmail(dot)com> |
Cc: | Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Bertrand Drouvot <bertranddrouvot(dot)pg(at)gmail(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
Subject: | RE: POC: enable logical decoding when wal_level = 'replica' without a server restart |
Date: | 2025-09-03 03:11:12 |
Message-ID: | OSCPR01MB149669C6E5BFAADC43FCDDCBDF501A@OSCPR01MB14966.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Dear Sawada-san,
Here are my comments.
01.
```
checkPoint.logicalDecodingEnabled = IsLogicalDecodingEnabled();
```
Per my analysis, the value is always false here because StartupLogicalDecodingStatus
is not called yet. Can we use "false" directly?
02.
```
elog(DEBUG1, "waiting for %d transactions to complete", running->xcnt);
```
Here plural form is always used even if the running transaction is only one.
How about something like:
```
Number of transactions to wait finishing: %d
```
03.
```
while (RecoveryInProgress())
{
pgstat_report_wait_start(WAIT_EVENT_LOGICAL_DECODING_STATUS_CHANGE_DELAY);
pg_usleep(100000L); /* wait for 100 msec */
pgstat_report_wait_end();
}
```
I found a stuck case here: if a backend process within the loop and startup waits
a signal is processed, both of them can stuck. The backend waits the recovery
state to be DONE, and the startup waits all processes handle consume the signal.
IIUC we must add CHECK_FOR_INTERRUPTS() or ProcessProcSignalBarrier().
Actual steps:
0. constructed a streaming replication system, which the only primary server had
a logical slot. I.e., the effective_wal_level was logical.
1. connected to a standby node
2. attached to the backend process via gdb
3. added a breakpoint at create_logical_replication_slot
4. called pg_create_logical_replication_slot() on the backend.
the backend will stop before ReplicationSlotCreate().
5. from another terminal, attached to the startup process via gdb
6. added a breakpoint at UpdateLogicalDecodingStatusEndOfRecovery()
7. from another terminal, send a promote signal to the standby.
The startup will stop at UpdateLogicalDecodingStatusEndOfRecovery()
8. executed steps on startup process, untill delay_status_change was updated
and LogicalDecodingControlLock was released.
9. detached from the backend process. It would stop at the loop in
start_logical_decoding_status_change().
10. detached from the startup process. It would wait all processes handled the
signal, but the backend won't do.
Best regards,
Hayato Kuroda
FUJITSU LIMITED
From | Date | Subject | |
---|---|---|---|
Next Message | Chao Li | 2025-09-03 03:20:17 | Re: Fix pg_waldump to exit cleanly at end of WAL |
Previous Message | Michael Paquier | 2025-09-03 02:47:06 | Re: Fix pg_waldump to exit cleanly at end of WAL |