BUG #6170: hot standby wedging on full-WAL disk

From: "Daniel Farina" <daniel(at)heroku(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #6170: hot standby wedging on full-WAL disk
Date: 2011-08-20 00:39:58
Message-ID: 201108200039.p7K0dwle083531@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 6170
Logged by: Daniel Farina
Email address: daniel(at)heroku(dot)com
PostgreSQL version: 9.0.4
Operating system: GNU/Linux Ubuntu 10.04 x86_64
Description: hot standby wedging on full-WAL disk
Details:

After seeing this a few times, I think I've found a reproducible way to
prevent Postgres from making progress with hot standby.

1) Set up a WAL disk that will run out of space in a reasonable amount of
time.

2) Run a hot standby with a restore_command and primary_connection_info set
in recovery.conf. ***Configure it to disable query cancellation***.

3) Begin a transaction, or long-running statement that prevents the
application of WAL records.

When the hot standby falls behind the primary it'll eventually bump out of
streaming mode, and will accumulate WAL until the disk fills.

Eventually the WAL disk will fill, and the hot standby cannot make any
progress until one deletes some WAL segments or otherwise makes a tiny bit
more room to work with. This state persists past killing the offensive
long-running-transaction backend and even a postgres restart. In the latter
case, one cannot even become 'hot' again, getting the "database system is
starting up" message, as Postgres wants to run a restore_command
immediately.

Furthermore, it appears that WAL segments from the future part of the
timeline (beyond what is being recovered at the moment) are stored on-disk
at that time. I also think I have identified some WAL segments that are
from before the prior checkpoint location via pg_controldata, so they
technically could be pruned. My wal_keep_segments is set, but I am not sure
if this has an effect on a hot standby.

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2011-08-20 15:20:55 Re: BUG #6166: configure from source fails with 'This platform is not thread-safe.' but was actually /tmp perms
Previous Message Marc Mamin 2011-08-19 15:47:14 Re: BUG #6168: db_link may generate additional unformatted log entries in stderr