| From: | Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com> |
|---|---|
| To: | pgsql-hackers(at)lists(dot)postgresql(dot)org |
| Subject: | Possible missing segments in archiving on standby |
| Date: | 2020-06-30 07:55:03 |
| Message-ID: | 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hello.
While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.
A. Walreceiver is gracefully terminated just after a segment is
finished.
B. Walreceiver is gracefully terminated while receiving filling chunks
for a segment switch.
The two above are reprodusible (without distinction between the two)
using a helper patch. See below.
There's one more issue here.
C. Standby doesn't archive a segment until walreceiver receives any
data for the next segment.
I'm not sure wehther we assume C as an issue.
The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000). That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.
The second attached patch fixes all of A to C, but seems somewhat
redundant.
Any opnions and/or suggestions are welcome.
The attached files are:
1. v1-0001-Make-sure-standby-archives-all-segments.patch:
Fix for A and B.
2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
Fix for A, B and C.
3. repro.sh
The reproducer shell script used below.
4. repro_helper.patch
Helper patch for repro.sh for master and patch 1 above.
5. repro_helper2.patch
Helper patch for repro.sh for patch 2 above.
=====
** REPRODUCER
The failure is reproducible with some code tweak.
1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge
4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"
5. look into the archive directory of the standby.
If no missing segments found in archive, repeat from 3.
The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.
$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008
The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.
regards.
--
Kyotaro Horiguchi
NTT Open Source Software Center
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Make-sure-standby-archives-all-segments.patch | text/x-patch | 1.5 KB |
| v1-0001-Make-sure-standby-archives-all-segments-immediate.patch | text/x-patch | 4.3 KB |
| repro_helper.patch | text/x-patch | 999 bytes |
| repro_helper2.patch | text/x-patch | 966 bytes |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Fujii Masao | 2020-06-30 08:07:03 | Re: min_safe_lsn column in pg_replication_slots view |
| Previous Message | Pavel Biryukov | 2020-06-30 07:09:13 | Re: posgres 12 bug (partitioned table) |