From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
---|---|
To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
Subject: | Possible missing segments in archiving on standby |
Date: | 2020-06-30 07:55:03 |
Message-ID: | 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hello.
While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.
A. Walreceiver is gracefully terminated just after a segment is
finished.
B. Walreceiver is gracefully terminated while receiving filling chunks
for a segment switch.
The two above are reprodusible (without distinction between the two)
using a helper patch. See below.
There's one more issue here.
C. Standby doesn't archive a segment until walreceiver receives any
data for the next segment.
I'm not sure wehther we assume C as an issue.
The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000). That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.
The second attached patch fixes all of A to C, but seems somewhat
redundant.
Any opnions and/or suggestions are welcome.
The attached files are:
1. v1-0001-Make-sure-standby-archives-all-segments.patch:
Fix for A and B.
2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
Fix for A, B and C.
3. repro.sh
The reproducer shell script used below.
4. repro_helper.patch
Helper patch for repro.sh for master and patch 1 above.
5. repro_helper2.patch
Helper patch for repro.sh for patch 2 above.
=====
** REPRODUCER
The failure is reproducible with some code tweak.
1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge
4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"
5. look into the archive directory of the standby.
If no missing segments found in archive, repeat from 3.
The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.
$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008
The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Make-sure-standby-archives-all-segments.patch | text/x-patch | 1.5 KB |
v1-0001-Make-sure-standby-archives-all-segments-immediate.patch | text/x-patch | 4.3 KB |
repro_helper.patch | text/x-patch | 999 bytes |
repro_helper2.patch | text/x-patch | 966 bytes |
From | Date | Subject | |
---|---|---|---|
Next Message | Fujii Masao | 2020-06-30 08:07:03 | Re: min_safe_lsn column in pg_replication_slots view |
Previous Message | Pavel Biryukov | 2020-06-30 07:09:13 | Re: posgres 12 bug (partitioned table) |