Possible missing segments in archiving on standby

From: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Possible missing segments in archiving on standby
Date: 2020-06-30 07:55:03
Message-ID: 20200630.165503.1465894182551545886.horikyota.ntt@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello.

While looking a patch, I found that a standby with archive_mode=always
fails to archive segments under certain conditions.

A. Walreceiver is gracefully terminated just after a segment is
finished.

B. Walreceiver is gracefully terminated while receiving filling chunks
for a segment switch.

The two above are reprodusible (without distinction between the two)
using a helper patch. See below.

There's one more issue here.

C. Standby doesn't archive a segment until walreceiver receives any
data for the next segment.

I'm not sure wehther we assume C as an issue.

The first attached patch fixes A and B. A side-effect of that is that
standby archives the previous segment of the streaming start
location. Concretely 00..0100..2 gets to be archived in the above case
(recovery starts at 0/3000000). That behavior doesn't seem to be a
proble since the segment is a part of the standby's data anyway.

The second attached patch fixes all of A to C, but seems somewhat
redundant.

Any opnions and/or suggestions are welcome.

The attached files are:

1. v1-0001-Make-sure-standby-archives-all-segments.patch:
Fix for A and B.

2. v1-0001-Make-sure-standby-archives-all-segments-immediate.patch:
Fix for A, B and C.

3. repro.sh
The reproducer shell script used below.

4. repro_helper.patch
Helper patch for repro.sh for master and patch 1 above.

5. repro_helper2.patch
Helper patch for repro.sh for patch 2 above.

=====
** REPRODUCER

The failure is reproducible with some code tweak.

1. Create a primary server with archive_mode=always then start it.
2. Create and start a standby.
3. touch /tmp/hoge

4. psql -c "create table t(); drop table t; select pg_switch_wal(); select pg_sleep(1); create table t(); drop table t; select pg_switch_wal();"

5. look into the archive directory of the standby.
If no missing segments found in archive, repeat from 3.

The third attached shell script is a reproducer for the problem,
needing the aid of the fourth patch attached.

$ mkdir testdir
$ cd testdir
$ bash ..../repro.sh
....
After test 2:
Primary location: 0/8000310
Standby location: 0/8000310
# primary archive
000000010000000000000003
000000010000000000000004
000000010000000000000005
000000010000000000000006
000000010000000000000007
000000010000000000000008
# standby archive
000000010000000000000003
000000010000000000000005
000000010000000000000006
000000010000000000000008

The segment 4 is skipped by the issue A and 7 is skipped by the issue
B.

regards.

--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment Content-Type Size
v1-0001-Make-sure-standby-archives-all-segments.patch text/x-patch 1.5 KB
v1-0001-Make-sure-standby-archives-all-segments-immediate.patch text/x-patch 4.3 KB
repro_helper.patch text/x-patch 999 bytes
repro_helper2.patch text/x-patch 966 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-06-30 08:07:03 Re: min_safe_lsn column in pg_replication_slots view
Previous Message Pavel Biryukov 2020-06-30 07:09:13 Re: posgres 12 bug (partitioned table)