Standby receiving part of missing WAL segment

From: Thom Brown <thom(at)linux(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Standby receiving part of missing WAL segment
Date: 2015-02-11 17:55:57
Message-ID: CAA-aLv5StMF=oeoP9WbjEbWuj+Y-EKqBhcp=5aP7WYvO_kSPhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

Today I witnessed a situation which appears to have gone down like this:

- The primary server starting streaming WAL data from segment 00A8 to the
standby
- The standby server started receiving that data
- Before 00A8 is finished, the wal sender process dies on the primary, but
the archiver process continues, and 00A8 ends up being archived as usual
- The primary continues to generate WAL and cleans up old WAL files from
pg_xlog until 00A8 is gone.
- The primary is restarted and the wal sender process is back up and running
- The standby says "waiting for 00A8", which it can no longer get from the
primary
- 00A8 is in the standby's archive directory, but the standby is waiting
for the rest of the segment from the primary via streaming replication, so
doesn't check the archive
- The standby is restarted
- The standby goes back into recovery and eventually replays 00A8 and
continues as normal.

Should the standby be able to get feedback from the primary that the
requested segment is no longer available, and therefore know to check its
archive?

Or should it check the archive anyway if it hasn't received any further WAL
data via the streaming replication connection after a certain amount of
time?

At the moment, the standby gets stuck forever in this situation, even
though it has access to the WAL it needs.

Thom

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Grzegorz Parka 2015-02-11 18:33:20 Re: [HACKERS] GSoC 2015 - mentors, students and admins.
Previous Message Jan Urbański 2015-02-11 17:20:08 Re: libpq's multi-threaded SSL callback handling is busted