Re: Hard limit on WAL space used (because PANIC sucks)

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Craig Ringer" <craig(at)2ndquadrant(dot)com>
Cc: "Josh Berkus" <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2013-06-09 22:39:35
Message-ID: DF0B1E1C6BD54895B16AAED7CFD94413@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Craig Ringer" <craig(at)2ndquadrant(dot)com>
> On 06/09/2013 08:32 AM, MauMau wrote:
>>
>> - Failure of a disk containing data directory or tablespace
>> If checkpoint can't write buffers to disk because of disk failure,
>> checkpoint cannot complete, thus WAL files accumulate in pg_xlog/.
>> This means that one disk failure will lead to postgres shutdown.
>
> I've seen a couple of people bitten by the misunderstanding that
> tablespaces are a way to split up your data based on different
> reliability requirements, and I really need to write a docs patch for
> http://www.postgresql.org/docs/current/static/manage-ag-tablespaces.html
> <http://www.postgresql.org/docs/9.2/static/manage-ag-tablespaces.html>
> that adds a prominent warning like:
>
> WARNING: Every tablespace must be present before the database can be
> started. There is no easy way to recover the database if a tablespace is
> lost to disk failure, deletion, use of volatile storage, etc. <b>Do not
> put a tablespace on a RAM disk</b>; instead just use UNLOGGED tables.
>
> (Opinions on the above?)

Yes, I'm sure this is useful for DBAs to know how postgres behaves and take
some preparations. However, this does not apply to my case, because I'm
using tablespaces for I/O distribution across multiple disks and simply for
database capacity.

The problem is that the reliability of the database system decreases with
more disks, because failure of any one of those disks would result in a
database PANIC shutdown

> I'd rather like to be able to recover from this by treating the
> tablespace as dead, so any attempt to get a lock on any table within it
> fails with an error and already-in-WAL writes to it just get discarded.
> It's the sort of thing that'd only be reasonable to do as a recovery
> option (like zero_damaged_pages) since if applied by default it'd lead
> to potentially severe and unexpected data loss.

I'm in favor of taking a tablespace offline when I/O failure is encountered,
and continue running the database server. But WAL must not be discarded
because committed transactions must be preserved for durability of ACID.

Postgres needs to take these steps when it encounters an I/O error:

1. Take the tablespace offline, so that subsequent read/write against it
returns an error without actually issuing read/write against data files.

2. Discard shared buffers containing data in the tablespace.

WAL is not affected by the offlining of tablespaces. WAL records already
written on the WAL buffer will be written to pg_xlog/ and archived as usual.
Those WAL records will be used to recover committed transactions during
archive recovery.

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-06-09 22:49:06 Re: JSON and unicode surrogate pairs
Previous Message Andres Freund 2013-06-09 21:58:49 Re: Valgrind Memcheck support