Re: pg_rewind WAL segments deletion pitfall

From: Alexander Kukushkin <cyberdemn(at)gmail(dot)com>
To: Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
Cc: bungina(at)gmail(dot)com, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: pg_rewind WAL segments deletion pitfall
Date: 2022-08-30 06:49:27
Message-ID: CAFh8B=kyrzXbsuyhM-Fydu6TG3kyu9=AFCyf4tG4cYrfw3897A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hello Kyotaro,

On Tue, 30 Aug 2022 at 07:50, Kyotaro Horiguchi <horikyota(dot)ntt(at)gmail(dot)com>
wrote:

> So, if I understand you correctly, the issue you are complaining is
> not about the WAL segments on the old timeline but about those on the
> new timeline, which don't have a business with what pg_rewind does. As
> the same with the case of pg_basebackup, the missing segments need to
> be somehow copied from the new primary since the old primary never had
> the chance to have them before.
>

No, we are complaining exactly about WAL segments from the old timeline
that are removed by pg_rewind.
Those segments haven't been archived by the old primary and the new primary
already recycled them.

>
> Thus I don't follow this..
>

I did a slight modification of your script that reproduces a problem.

====
mkdir newarch oldarch
initdb -k -D oldprim
echo "archive_mode = 'on'">> oldprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/oldarch/%f'">>
oldprim/postgresql.conf
pg_ctl -D oldprim -o '-p 5432' -l oldprim.log start
psql -p 5432 -c 'create table t(a int)'
pg_basebackup -D newprim -p 5432
echo "primary_conninfo='host=/tmp port=5432'">> newprim/postgresql.conf
echo "archive_command = 'echo "archive %f" >&2; cp %p `pwd`/newarch/%f'">>
newprim/postgresql.conf
touch newprim/standby.signal
pg_ctl -D newprim -o '-p 5433' -l newprim.log start

# the last common checkpoint
psql -p 5432 -c 'checkpoint'

# old primary cannot archive any more
echo "archive_command = 'false'">> oldprim/postgresql.conf
pg_ctl -D oldprim reload
# advance WAL on the old primary; four WAL segments will never make it to
the archive
for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select
pg_switch_wal();'; done

# record approx. diverging WAL segment
start_wal=`psql -p 5432 -Atc "select
pg_walfile_name(pg_last_wal_replay_lsn() - (select setting from pg_settings
where name = 'wal_segment_size')::int);"`
pg_ctl -D newprim promote

# old rprimary loses diverging WAL segment
for i in $(seq 1 4); do psql -p 5432 -c 'insert into t values(0); select
pg_switch_wal();'; done
psql -p 5432 -c 'checkpoint;'
psql -p 5433 -c 'checkpoint;'

pg_ctl -D oldprim stop

# rewind the old primary, using its own archive
# pg_rewind -D oldprim --source-server='port=5433' # should fail
echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/oldarch/%f %p'">>
oldprim/postgresql.conf
pg_rewind -D oldprim --source-server='port=5433' -c

# advance WAL on the old primary; new primary loses the launching WAL seg
for i in $(seq 1 4); do psql -p 5433 -c 'insert into t values(0); select
pg_switch_wal();'; done
psql -p 5433 -c 'checkpoint'
echo "primary_conninfo='host=/tmp port=5433'">> oldprim/postgresql.conf
touch oldprim/standby.signal

postgres -D oldprim # fails with "WAL file has been removed"

# The alternative of copying-in
# echo "restore_command = 'echo "restore %f" >&2; cp `pwd`/newarch/%f
%p'">> oldprim/postgresql.conf

# copy-in WAL files from new primary's archive to old primary
(cd newarch;
for f in `ls`; do
if [[ "$f" > "$start_wal" ]]; then echo copy $f; cp $f ../oldprim/pg_wal;
fi
done)

postgres -D oldprim # also fails with "requested WAL segment XXX has
already been removed"
===

Regards,
--
Alexander Kukushkin

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Kukushkin 2022-08-30 06:56:10 Re: pg_rewind WAL segments deletion pitfall
Previous Message Amit Kapila 2022-08-30 06:44:00 Re: Excessive number of replication slots for 12->14 logical replication

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kukushkin 2022-08-30 06:56:10 Re: pg_rewind WAL segments deletion pitfall
Previous Message Amit Kapila 2022-08-30 06:44:40 Re: patch: Add missing descriptions for rmgr APIs