Re: BUG #6170: hot standby wedging on full-WAL disk

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: Daniel Farina <daniel(at)heroku(dot)com>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #6170: hot standby wedging on full-WAL disk
Date: 2011-08-22 06:57:29
Message-ID: 4E51FDD9.6050403@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 20.08.2011 03:39, Daniel Farina wrote:
>
> The following bug has been logged online:
>
> Bug reference: 6170
> Logged by: Daniel Farina
> Email address: daniel(at)heroku(dot)com
> PostgreSQL version: 9.0.4
> Operating system: GNU/Linux Ubuntu 10.04 x86_64
> Description: hot standby wedging on full-WAL disk
> Details:
>
> After seeing this a few times, I think I've found a reproducible way to
> prevent Postgres from making progress with hot standby.
>
> 1) Set up a WAL disk that will run out of space in a reasonable amount of
> time.
>
> 2) Run a hot standby with a restore_command and primary_connection_info set
> in recovery.conf. ***Configure it to disable query cancellation***.
>
> 3) Begin a transaction, or long-running statement that prevents the
> application of WAL records.
>
> When the hot standby falls behind the primary it'll eventually bump out of
> streaming mode, and will accumulate WAL until the disk fills.
>
> Eventually the WAL disk will fill, and the hot standby cannot make any
> progress until one deletes some WAL segments or otherwise makes a tiny bit
> more room to work with. This state persists past killing the offensive
> long-running-transaction backend and even a postgres restart. In the latter
> case, one cannot even become 'hot' again, getting the "database system is
> starting up" message, as Postgres wants to run a restore_command
> immediately.
>
> Furthermore, it appears that WAL segments from the future part of the
> timeline (beyond what is being recovered at the moment) are stored on-disk
> at that time. I also think I have identified some WAL segments that are
> from before the prior checkpoint location via pg_controldata, so they
> technically could be pruned. My wal_keep_segments is set, but I am not sure
> if this has an effect on a hot standby.

So the problem is that walreceiver merrily writes so much future WAL
that it runs out of disk space? A limit on the maximum number of future
WAL files to stream ahead would fix that, but I can't get very excited
about it. Usually you do want to stream as much ahead as you can, to
ensure that the WAL is safely on disk on the standby, in case the master
dies. So the limit would need to be configurable.

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message John R Pierce 2011-08-22 07:11:44 Re: BUG #6171: Sockets Issue
Previous Message Bruce Momjian 2011-08-22 01:23:43 Re: BUG #6166: configure from source fails with 'This platform is not thread-safe.' but was actually /tmp perms