Re: Missing important information in backup.sgml

From: "Gunnar \"Nick\" Bluth" <gunnar(dot)bluth(at)pro-open(dot)de>
To: pgsql-docs(at)postgresql(dot)org
Cc: Stephen Frost <sfrost(at)snowman(dot)net>
Subject: Re: Missing important information in backup.sgml
Date: 2016-11-23 18:24:35
Message-ID: cd880db6-1ff6-8fa7-4868-249421cd6fb2@pro-open.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-docs

Am 16.11.2016 um 22:07 schrieb Gunnar "Nick" Bluth:
> Am 16.11.2016 um 15:36 schrieb Stephen Frost:
>> Gunnar, all,
>>
>> * Gunnar "Nick" Bluth (gunnar(dot)bluth(dot)extern(at)elster(dot)de) wrote:
>>> Am 16.11.2016 um 11:37 schrieb Gunnar "Nick" Bluth:
>>>> I ran into this issue (see patch) a few times over the past years, and
>>>> tend to forget it again (sigh!). Today I had to clean up a few hundred
>>>> GB of unarchived WALs, so I decided to write a patch for the
>>>> documentation this time.
>>>
>>> Uhm, well, the actual problem was a stale replication slot... and
>>> tomatoes on my eyes, it seems ;-/. Ashes etc.!
>>>
>>> However, I still think a warning on (esp. rsync's) RCs >= 128 is worth
>>> considering (see -v2 attached).
>>
>> Frankly, I wouldn't suggest including such wording as it would imply
>> that using a bare rsync command is an acceptable configuration of
>> archive_command. It isn't. At the very least, a bare rsync does
>> nothing to ensure that the WAL has been fsync'd to permanent storage
>> before returning, leading to potential data loss due to the WAL
>> segment being removed by PG before the new segment has been permanently
>> stored.
>
> I for myself deem a UPS-backed server in a different DC a pretty good
> starting point, and I reckon many others do as well... obviously it's
> not a belt and bracers solution, but my guess would be that > 90% of
> users have something similar in place, many of them actually using rsync
> (or scp) one way or the other (or, think WAL-E et. al., how do you force
> an fsync on AWS?!?).
> In environments where there's a risk of the WAL segment being
> overwritten before that target server has fsync'd, heck, yeah, you're
> right. But then you'd probably have something quite sophisticated in
> place, and hate to see allegedly random _FATAL_ errors that are _not
> documented outside the source code_ even more. Esp. when you can't tell
> for sure (from the docs) if archiving your WAL segment will be retried
> or not.
>
>> The PG documentation around archive command is, at best, a starting
>> point for individuals who wish to implement their own proper backup
>> solution, not as examples of good practice for production environments.
>
> True. Which doesn't mean there's no room for more hints, like "ok, we
> throw a FATAL error sometimes, but they're not really a problem, you
> know, it's just external software that basically everyone uses at one
> point or the other doing odd things sometimes" ;-).
>
> Alas, I've been hunting a red herring today, cause when you find your
> pg_xlog cluttered with old files _and_ see FATAL archiving messages in
> your logs, your first thought is not "there's prolly a replication slot
> left over", but "uh oh, those archive_command calls failed, so something
> might be somehow stuck now".
>
> I'll try to come up with something more comprehensive, taking your
> comments into account...

So, attached is what I came up with. It's obviously not "complete",
however it points out the RC >= 128 "quirk" and also mentions Stephen's
remarks on rsync (although to get actual _data loss_, you'd have to have
a power outage in the DC caused by your PG server exploding... ;-).

Cheers,
--
Gunnar "Nick" Bluth
RHCE/SCLA

Mobil +49 172 8853339
Email: gunnar(dot)bluth(at)pro-open(dot)de
_____________________________________________________________
In 1984 mainstream users were choosing VMS over UNIX.
Ten years later they are choosing Windows over UNIX.
What part of that message aren't you getting? - Tom Payne

Attachment Content-Type Size
archiving_doc_v4.patch text/plain 2.1 KB
0x3289338C.asc application/pgp-keys 1.7 KB

In response to

Responses

Browse pgsql-docs by date

  From Date Subject
Next Message Don Morrison 2016-11-23 18:40:45 Re: Better example
Previous Message Bruce Momjian 2016-11-23 17:59:41 Re: Better example