[BUG] Take a long time to reach consistent after pg_rewind

From: cca5507 <cca5507(at)qq(dot)com>
To: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [BUG] Take a long time to reach consistent after pg_rewind
Date: 2026-04-10 09:57:39
Message-ID: tencent_A37630454CE614AFE640C56E33B4A0F0AE05@qq.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Steps to reproduce (PG19):

1) start two nodes, node1 (primary), node2 (standby), both with the following configuration:

```
archive_mode = on
archive_command = '/bin/true'
archive_timeout = 10
checkpoint_timeout = '60min'
wal_keep_size = 1024
logging_collector = on
```

2) promote node2

3) stop node1

4) make sure the pg_current_wal_insert_lsn() of node2 is at the begin of a wal
segment (end with 000028), if not, do a checkpoint and recheck. (archive_timeout
will switch the wal)

5) execute pg_rewind with node1

6) start node1

7) now node1 can't reach consistent until node2 write some wal

Logs of node1:

```
2026-04-10 16:16:07.802 CST [45623] LOG: starting backup recovery with redo LSN 0/02000028, checkpoint LSN 0/02000088, on timeline ID 1
2026-04-10 16:16:07.802 CST [45623] LOG: entering standby mode
2026-04-10 16:16:07.803 CST [45623] LOG: redo starts at 0/02000028
2026-04-10 16:16:07.803 CST [45623] LOG: completed backup recovery with redo LSN 0/02000028 and end LSN 0/02000130
2026-04-10 16:16:07.806 CST [45624] LOG: started streaming WAL from primary at 0/04000000 on timeline 2
2026-04-10 16:19:13.083 CST [47039] FATAL: the database system is not yet accepting connections
2026-04-10 16:19:13.083 CST [47039] DETAIL: Consistent recovery state has not been yet reached.
2026-04-10 16:20:16.413 CST [45623] LOG: consistent recovery state reached at 0/04000048
2026-04-10 16:20:16.413 CST [45616] LOG: database system is ready to accept read-only connections
```

Root cause:

The min recovery point of node1 is at 0/04000028, but node2 doesn't have any wal after that and may keep idle for
a long time.

Possible fix:

The pg_rewind use pg_current_wal_insert_lsn() to set the min recovery point, which calls
GetXLogInsertRecPtr() and returns the latest wal insert pointer. Maybe we should use
GetXLogInsertEndRecPtr() which returns the latest wal record end pointer.

Thoughts?

--
Regards,
ChangAo Chen

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2026-04-10 10:06:29 Re: Support EXCEPT for ALL SEQUENCES publications
Previous Message Amit Kapila 2026-04-10 09:54:55 Re: pgsql: Reduce log level of some logical decoding messages from LOG to D