Re: Why standby restores some WALs many times from archive?

From: Sergey Burladyan <eshkinkot(at)gmail(dot)com>
To: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
Cc: Victor Yagofarov <xnasx(at)yandex(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Why standby restores some WALs many times from archive?
Date: 2017-12-30 01:30:07
Message-ID: 87efndt334.fsf@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Jeff Janes <jeff(dot)janes(at)gmail(dot)com> writes:

> On Thu, Dec 28, 2017 at 7:02 AM, Victor Yagofarov <xnasx(at)yandex(dot)ru> wrote:
> > I have postgres 9.4 standby with archive-based replication (via
> > restore_command).

> Can you show us both your archive_command and your restore_command?

We use this scripts:
https://github.com/avito-tech/dba-utils/tree/master/pg_archive

But I can reproduce problem with simple cp & mv:
archive_command:
test ! -f /var/lib/postgresql/wals/%f && \
test ! -f /var/lib/postgresql/wals/%f.tmp && \
cp %p /var/lib/postgresql/wals/%f.tmp && \
mv /var/lib/postgresql/wals/%f.tmp /var/lib/postgresql/wals/%f

recovery.conf:
restore_command = 'cp /var/lib/postgresql/wals/%f %p'
archive_cleanup_command = 'pg_archivecleanup /var/lib/postgresql/wals/ %r'
standby_mode = on

I use docker for tests and run standby with DEBUG3 (full log in attachments).

I run this commands in test:
PGVER=9.4 docker-compose up
docker-compose exec -u postgres master psql \
-c "create database test"
docker-compose exec -u postgres master psql -d test \
-c "create table xxx (id serial, v text)"
# we need some recycled WALs
docker-compose exec -u postgres master psql -d test \
-c "insert into xxx (v) select 'this is long line for fill space ' || n from generate_series(1, 5000000) n"
docker-compose exec -u postgres standby psql -c "checkpoint"
docker-compose exec -u postgres master psql -d test \
-c "insert into xxx (v) select 'this is long line for fill space ' || n from generate_series(1, 1000000) n; truncate xxx;"
docker-compose logs standby | grep 'restored log file' | sort | uniq -c | awk '$1 > 1 { print }'
docker-compose down -v

Log have two duplicated WALS:
4 | 000000010000000000000024
3 | 00000001000000000000002B

========================================
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/000000010000000000000024 pg_xlog/RECOVERYXLOG"
LOG: restored log file "000000010000000000000024" from archive
DEBUG: got WAL segment from archive
DEBUG: skipping restartpoint, already performed at 0/22258058
DEBUG: executing restore command "cp /var/lib/postgresql/wals/000000010000000000000025 pg_xlog/RECOVERYXLOG"
cp: cannot stat ‘/var/lib/postgresql/wals/000000010000000000000025’: No such file or directory
DEBUG: could not restore file "000000010000000000000025" from archive: child process exited with exit code 1
DEBUG: unexpected pageaddr 0/10000000 in log segment 000000010000000000000025, offset 0
DEBUG: switched WAL source from archive to stream after failure
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/000000010000000000000024 pg_xlog/RECOVERYXLOG"
LOG: restored log file "000000010000000000000024" from archive
DEBUG: got WAL segment from archive
DEBUG: skipping restartpoint, already performed at 0/22258058
DEBUG: executing restore command "cp /var/lib/postgresql/wals/000000010000000000000025 pg_xlog/RECOVERYXLOG"
cp: cannot stat ‘/var/lib/postgresql/wals/000000010000000000000025’: No such file or directory
DEBUG: could not restore file "000000010000000000000025" from archive: child process exited with exit code 1
DEBUG: unexpected pageaddr 0/10000000 in log segment 000000010000000000000025, offset 0
DEBUG: switched WAL source from archive to stream after failure
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/000000010000000000000024 pg_xlog/RECOVERYXLOG"
LOG: restored log file "000000010000000000000024" from archive
DEBUG: got WAL segment from archive
DEBUG: skipping restartpoint, already performed at 0/22258058
========================================

========================================
LOG: restartpoint starting: xlog
DEBUG: performing replication slot checkpoint
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/00000001000000000000002C pg_xlog/RECOVERYXLOG"
cp: cannot stat ‘/var/lib/postgresql/wals/00000001000000000000002C’: No such file or directory
DEBUG: could not restore file "00000001000000000000002C" from archive: child process exited with exit code 1
LOG: unexpected pageaddr 0/21000000 in log segment 00000001000000000000002C, offset 0
DEBUG: switched WAL source from archive to stream after failure
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/00000001000000000000002B pg_xlog/RECOVERYXLOG"
LOG: restored log file "00000001000000000000002B" from archive
DEBUG: got WAL segment from archive
DEBUG: executing restore command "cp /var/lib/postgresql/wals/00000001000000000000002C pg_xlog/RECOVERYXLOG"
cp: cannot stat ‘/var/lib/postgresql/wals/00000001000000000000002C’: No such file or directory
DEBUG: could not restore file "00000001000000000000002C" from archive: child process exited with exit code 1
DEBUG: unexpected pageaddr 0/21000000 in log segment 00000001000000000000002C, offset 0
DEBUG: switched WAL source from archive to stream after failure
DEBUG: switched WAL source from stream to archive after failure
DEBUG: executing restore command "cp /var/lib/postgresql/wals/00000001000000000000002B pg_xlog/RECOVERYXLOG"
LOG: restored log file "00000001000000000000002B" from archive
DEBUG: got WAL segment from archive
DEBUG: executing restore command "cp /var/lib/postgresql/wals/00000001000000000000002C pg_xlog/RECOVERYXLOG"
LOG: restored log file "00000001000000000000002C" from archive
DEBUG: got WAL segment from archive
========================================

--
Sergey Burladyan

Attachment Content-Type Size
standby.log.bz2 application/x-bzip2 5.6 KB
pg-wal-arch-test.tar.bz2 application/x-tar 1.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-12-30 03:39:05 Re: Converting plpgsql to use DTYPE_REC for named composite types
Previous Message Tatsuo Ishii 2017-12-29 23:40:57 Re: Fix a Oracle-compatible instr function in the documentation