Re: [RFC] What should we do for reliable WAL archiving?

From: Jeff Janes <jeff(dot)janes(at)gmail(dot)com>
To: MauMau <maumau307(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [RFC] What should we do for reliable WAL archiving?
Date: 2014-03-21 20:16:08
Message-ID: CAMkU=1zr5mcqZYP0eLXRMN1Le66pRNH-hjpzCHZi8racgW9Y-A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 16, 2014 at 3:23 AM, MauMau <maumau307(at)gmail(dot)com> wrote:

> Hello,
>
> The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
> Windows) as an example for archive_command. However, cp/copy does not sync
> the copied data to disk. As a result, the completed WAL segments would be
> lost in the following sequence:
>
> 1. A WAL segment fills up.
>
> 2. The archiver process archives the just filled WAL segment using
> archive_command. That is, cp/copy reads the WAL segment file from pg_xlog/
> and writes to the archive area. At this point, the WAL file is not
> persisted to the archive area yet, because cp/copy doesn't sync the writes.
>
> 3. The checkpoint processing removes the WAL segment file from pg_xlog/.
>

Note that it takes two checkpoints for this to happen, at least as
currently coded.

Also, if the system crashed badly enough to need media recovery, rather
than just automatic crash recovery, some lost transactions are expected.
Although this could silently break your PITR chain, of a crash happened
and automatic recover used the copy in pg_xlog (which of course was synced)
, while copy in the archive was not synced.

> 4. The OS crashes. The filled WAL segment doesn't exist anywhere any more.
>
> Considering the "reliable" image of PostgreSQL and widespread use in
> enterprise systems, I think something should be done. Could you give me
> your opinions on the right direction? Although the doc certainly escapes
> by saying "(This is an example, not a recommendation, and might not work on
> all platforms.)", it seems from pgsql-xxx MLs that many people are
> following this example.
>

I use this as an example, kind of, but what I am copying to is a network
mount, so any attempts to fsync it there would probably need unavailable
hooks into the remote file system.

Do people really just copy the files from one directory of local storage to
another directory of local storage? I don't see the point of that. But it
seems like this is an area where there are hundreds of use cases, and often
one doesn't see the point of other people's, making it hard to come up with
good examples.

>
> * Improve the example in the documentation.
> But what command can we use to reliably sync just one file?
>
> * Provide some command, say pg_copy, which copies a file synchronously by
> using fsync(), and describes in the doc something like "for simple use
> cases, you can use pg_copy as the standard reliable copy command."
>

The recommendation is to refuse to overwrite an existing file of the same
name, and exit with failure. Which essentially brings archiving to a halt,
because it keeps trying but it will keep failing. If we make a custom
version, one thing it should do is determine if the existing archived file
is just a truncated version of the attempting-to-be archived file, and if
so overwrite it. Because if the first archival command fails with a
network glitch, it can leave behind a partial file.

Cheers,

Jeff

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2014-03-21 20:58:28 Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To:
Previous Message Simon Riggs 2014-03-21 19:11:14 Re: ALTER TABLE lock strength reduction patch is unsafe Reply-To: