Re: [BUG] Panic due to incorrect missingContrecPtr after promotion

From: "Imseih (AWS), Sami" <simseih(at)amazon(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: "alvherre(at)alvh(dot)no-ip(dot)org" <alvherre(at)alvh(dot)no-ip(dot)org>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [BUG] Panic due to incorrect missingContrecPtr after promotion
Date: 2022-02-25 04:27:19
Message-ID: 379F2F42-F19A-4026-AB09-42E6F9ABB5F5@amazon.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> Nice catch! However, I'm not sure I like the patch.

> * made it through and start writing after the portion that persisted.
> * (It's critical to first write an OVERWRITE_CONTRECORD message, which
> * we'll do as soon as we're open for writing new WAL.)
> + *
> + * If the last wal record is ahead of the missing contrecord, this is a
> + * recently promoted primary and we should not write an overwrite
> + * contrecord.

> Before the part, the comment follows the part shown below.

> > * Actually, if WAL ended in an incomplete record, skip the parts that

> So, actually WAL did not ended in an incomplete record. I think
> FinishWalRecover is the last place to do that. (But it could be
> earlier.)

Thanks for the feedback.

## On the primary, an OVERWRITE_CONTRECORD is written. The aborted record (abortedRecPtr) and insert position of the missing contrecord (missingContrecPtr) are set during FinishWalRecovery and after recovery but before writes are accepted, the OVERWRITE_CONTRECORD is written.
## on the primary logs

2022-02-25 02:25:16.208 UTC [7397] LOG: redo done at 0/1FFD2B8 system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.01 s
2022-02-25 02:25:16.208 UTC [7397] DEBUG: resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.209 UTC [7397] DEBUG: creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG: done creating and filling new WAL file
2022-02-25 02:25:16.217 UTC [7397] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.217 UTC [7397] DEBUG: OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0 <<------- Attached V3 of patch that adds logging which shows the overwrite record created on the primary
2022-02-25 02:25:16.217 UTC [7395] LOG: checkpoint starting: end-of-recovery immediate wait
2022-02-25 02:25:16.217 UTC [7395] DEBUG: performing replication slot checkpoint
2022-02-25 02:25:16.218 UTC [7395] DEBUG: attempting to remove WAL segments older than log file 000000000000000000000000
2022-02-25 02:25:16.218 UTC [7395] LOG: checkpoint complete: wrote 131 buffers (102.3%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.001 s; sync files=0, longest=0.000 s, average=0.000 s; distance=11381 kB, estimate=11381 kB
2022-02-25 02:25:16.219 UTC [7394] DEBUG: starting background worker process "logical replication launcher"
2022-02-25 02:25:16.219 UTC [7394] LOG: database system is ready to accept connections

## A downstream standby now skips over this OVERWRITE_CONTRECORD, but the value for missingContrecPtr is not invalidated afterwards.
## on the standby logs

2022-02-25 02:25:16.616 UTC [7413] DEBUG: sending hot standby feedback xmin 0 epoch 0 catalog_xmin 0 catalog_xmin_epoch 0
2022-02-25 02:25:16.616 UTC [7387] LOG: successfully skipped missing contrecord at 0/1FFD2E0, overwritten at 2022-02-25 02:25:16.2175+00
2022-02-25 02:25:16.616 UTC [7387] CONTEXT: WAL redo at 0/2000028 for XLOG/OVERWRITE_CONTRECORD: lsn 0/1FFD2E0; time 2022-02-25 02:25:16.2175+00

## After promotion, the standby attempts to write the overwrite_contrecord again using the missingContrecPtr LSN which is now in the past.
## on the standby logs

2022-02-25 02:25:16.646 UTC [7387] LOG: invalid record length at 0/201EC70: wanted 24, got 0
2022-02-25 02:25:16.646 UTC [7387] LOG: redo done at 0/201EC48 system usage: CPU: user: 0.01 s, system: 0.00 s, elapsed: 0.55 s
2022-02-25 02:25:16.646 UTC [7387] LOG: last completed transaction was at log time 2022-02-25 02:25:16.301554+00
2022-02-25 02:25:16.646 UTC [7387] DEBUG: resetting unlogged relations: cleanup 0 init 1
2022-02-25 02:25:16.646 UTC [7387] LOG: selected new timeline ID: 2
2022-02-25 02:25:16.646 UTC [7387] DEBUG: updated min recovery point to 0/201EC70 on timeline 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG: could not remove file "pg_wal/000000020000000000000002": No such file or directory
2022-02-25 02:25:16.656 UTC [7387] LOG: archive recovery complete
2022-02-25 02:25:16.656 UTC [7387] DEBUG: MultiXactId wrap limit is 2147483648, limited by database with OID 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG: MultiXact member stop limit is now 4294914944 based on MultiXact 1
2022-02-25 02:25:16.656 UTC [7387] DEBUG: OVERWRITE_CONTRECORD inserted at 0/2000028 for aborted record 0/1FFD2E0 <<------- The same overwrite record is written on the recently promoted standby
2022-02-25 02:25:16.656 UTC [7387] DEBUG: attempting to remove WAL segments newer than log file 000000020000000000000001
2022-02-25 02:25:16.656 UTC [7387] DEBUG: recycled write-ahead log file "000000010000000000000002"
2022-02-25 02:25:16.656 UTC [7387] DEBUG: release all standby locks
2022-02-25 02:25:16.656 UTC [7385] LOG: checkpoint starting: force
2022-02-25 02:25:16.656 UTC [7385] DEBUG: performing replication slot checkpoint
2022-02-25 02:25:16.656 UTC [7385] LOG: request to flush past end of generated WAL; request 0/201EC70, current position 0/2000088
2022-02-25 02:25:16.656 UTC [7385] PANIC: xlog flush request 0/201EC70 is not satisfied --- flushed only to 0/2000088
2022-02-25 02:25:16.657 UTC [7384] LOG: checkpointer process (PID 7385) was terminated by signal 6: Aborted

The purpose of the patch is to check that if the current LSN is beyond missingContrecPtr, and to skip ( incorrectly ) writing an overwrite contrecord that was written on the old writer.

--
Sami Imseih
Amazon Web Services

Attachment Content-Type Size
v3-0001-Fix-missing-continuation-record-after-standby-pro.patch application/octet-stream 5.6 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2022-02-25 04:35:49 Re: BufferAlloc: don't take two simultaneous locks
Previous Message Noboru Saito 2022-02-25 04:23:31 Re: Separate the result of \watch for each query execution (psql)