Re: Streaming replication and WAL archive interactions

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Venkata Balaji N <nag1010(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Borodin Vladimir <root(at)simply(dot)name>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Streaming replication and WAL archive interactions
Date: 2015-05-13 12:53:04
Message-ID: 55534930.4040905@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/13/2015 03:36 PM, Robert Haas wrote:
> On Mon, May 11, 2015 at 12:00 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> And here is a new version of the patch. I kept the approach of using pgstat,
>> but it now only polls pgstat every 10 seconds, and doesn't block to wait for
>> updated stats.
>
> It's not entirely a new problem, but this error message has gotten pretty crazy:
>
> + (errmsg("WAL archival
> (archive_mode=on/always/shared) requires wal_level \"archive\",
> \"hot_standby\", or \"logical\"")));
>
> Maybe: WAL archival cannot be enabled when wal_level is "minimal"
>
> I think the documentation should be explicit about what happens if the
> primary archives a file and dies before the standby gets notified that
> the archiving happened.

Yes, good point.

> The standby, running in shared mode, is then
> promoted. My first guess would be that the standby will end up with
> files that thinks it needs to archive but, being unable to do so
> because they're already there, they'll live forever in pg_xlog. I
> hope that's not the case.

Hmm. That is exactly what happens. The standby will attempt to archive
them, which will fail, so the archiver will get stuck retrying.

That's not actually a new problem though. Even with a single server
doing archiving, it's possible that you crash just after archive_command
has archived a file, but before it has created the .done file. After
restart, the server will try to archive the file again, which will fail.
But yeah, with this patch, that's much more likely to happen after a
promotion.

Our manual says that archive_command should refuse to overwrite an
existing file. But to work-around the double-archival problem, where the
same file is archived twice, it would be even better if it would simply
return success if the file exists, *and has identical contents*. I don't
know how to code that logic in a simple one-liner though.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-05-13 13:28:37 Re: Auditing extension for PostgreSQL (Take 2)
Previous Message Kohei KaiGai 2015-05-13 12:49:46 Re: One question about security label command