Re: Hard limit on WAL space used (because PANIC sucks)

From: "MauMau" <maumau307(at)gmail(dot)com>
To: "Daniel Farina" <daniel(at)heroku(dot)com>, "Josh Berkus" <josh(at)agliodbs(dot)com>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Jeff Janes" <jeff(dot)janes(at)gmail(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Hard limit on WAL space used (because PANIC sucks)
Date: 2013-06-08 14:36:51
Message-ID: 879A3981DBFB4B2D8B3A92C1B5E5CED6@maumau
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: "Daniel Farina" <daniel(at)heroku(dot)com>
> On Fri, Jun 7, 2013 at 12:14 PM, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
>> Right now, what we're telling users is "You can have continuous backup
>> with Postgres, but you'd better hire and expensive consultant to set it
>> up for you, or use this external tool of dubious provenance which
>> there's no packages for, or you might accidentally cause your database
>> to shut down in the middle of the night."
>>
>> At which point most sensible users say "no thanks, I'll use something
>> else".

> Inverted and just as well supported: "if you want to not accidentally
> lose data, you better hire an expensive consultant to check your
> systems for all sorts of default 'safety = off' features." This
> being but the hypothetical first one.
>
> Furthermore, I see no reason why high quality external archiving
> software cannot exist. Maybe some even exists already, and no doubt
> they can be improved and the contract with Postgres enriched to that
> purpose.
>
> Finally, it's not that hard to teach any archiver how to no-op at
> user-peril, or perhaps Postgres can learn a way to do this expressly
> to standardize the procedure a bit to ease publicly shared recipes,
> perhaps.

Yes, I feel designing reliable archiving, even for the simplest case - copy
WAL to disk, is very difficult. I know there are following three problems
if you just follow the PostgreSQL manual. Average users won't notice them.
I guess even professional DBAs migrating from other DBMSs won't, either.

1. If the machine or postgres crashes while archive_command is copying a WAL
file, later archive recovery fails.
This is because cp leaves a file of less than 16MB in archive area, and
postgres refuses to start when it finds such a small archive WAL file.
The solution, which IIRC Tomas san told me here, is to do like "cp %p
/archive/dir/%f.tmp && mv /archive/dir/%f.tmp /archive/dir/%f".

2. archive_command dumps core when you run pg_ctl stop -mi.
This is because postmaster sends SIGQUIT to all its descendants. The core
files accumulate in the data directory, which will be backed up with the
database. Of course those core files are garbage.
archive_command script needs to catch SIGQUIT and exit.

3. You cannot know the reason of archive_command failure (e.g. archive area
full) if you don't use PostgreSQL's server logging.
This is because archive_command failure is not logged in syslog/eventlog.

I hope PostgreSQL will provide a reliable archiving facility that is ready
to use.

Regards
MauMau

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2013-06-08 14:42:41 Re: UTF-8 encoding problem w/ libpq
Previous Message Noah Misch 2013-06-08 14:30:32 Re: Optimising Foreign Key checks