Re: Bad recovery: no pg_xlog/RECOVERYXLOG

From: Mark Kirkwood <mark(dot)kirkwood(at)catalyst(dot)net(dot)nz>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Bad recovery: no pg_xlog/RECOVERYXLOG
Date: 2017-11-07 02:49:14
Message-ID: 118a278b-957f-20aa-70b3-9f37018acdea@catalyst.net.nz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

On 07/11/17 02:37, Stephen Frost wrote:
> Mark,
>
> * Mark Kirkwood (mark(dot)kirkwood(at)catalyst(dot)net(dot)nz) wrote:
>> On 03/11/17 00:11, Stephen Frost wrote:
>>> Sure, that'll work much of the time, but that's about like saying that
>>> PG could run without fsync being enabled much of the time and everything
>>> will be ok. Both are accurate, but hopefully you'll agree that PG
>>> really should always be run with fsync enabled.
>> It is completely different - this is a 'straw man' argument, and
>> justs serves to confuse this discussion.
> I don't see it as any different at all. The point I was trying to make
> there is that there's a minimum requirement for backups, just as there
> is for ACID compliance, and any solution needs to meet that minimum to
> be considered.

Ok and apologies - I thought you were going all 'schoolboy debating' on
me :-) . I'll discuss how I'm seeing this:

In the case of a db server running with fsync off, one crash and it may
not be able to be restarted - ever, so pretty severe loss of service.

In the case of a backup server crashing immediately after a backup
(assuming archive logs and backup going to same host for simplicity),
then *if undected* it could mean that later you cannot restore this
backup - very bad...so in that case I agree with you. However detection
(i.e monitoring) is essential otherwise a meticulously fsync'd set of
WAL can be lost or corrupted by the various usual suspects too (bad
ram/hba/disk...) - with the same result. So assuming we have monitoring
doing its thing, after the backup server crashes then missing or damaged
WAL can be retrieved from our still running db server - or if they have
been recycled, then we need to do another backup. No loss of service.

>> The crux of your argument seems to be concerning the synchronization
>> between pg_basbackup finishing and being sure you have the required
>> archive logs. Now just so we are all clear, when pg_basebackup ends
>> it essentially calls do_pg_stop_backup (from xlog.c) which ensures
>> that all required WAL files are archived, or to be precise here
>> makes sure archive_command has been run successfully for each
>> required WAL file.
> pg_basebackup talks the replication protocol, to be clear, and sends a
> BASE_BACKUP message, of which one of the options is 'NOWAIT' to indicate
> if the server should wait until all of the WAL has been archived.
> Typically, pg_basebackup does send a 'NOWAIT' to tell the server to not
> hold up the final message until all of the WAL has been archived,
> because it's handling the verification of the WAL having been archived.
> In the unusual case that WAL isn't included with the pg_basebackup it
> looks like it would wait for the archive_command to complete, which is
> better than I had thought (and hadn't noticed on my first glance through
> the code), though that does depend on a functional and perfect
> archive_command, and there's no shortage of reasons for why that might
> not be the case at the time the backup is happening.
>
> That's an awful lot of action-at-a-distance hope for me to be
> comfortable with, however. A backup solution really does need to verify
> that the WAL has been completely and reliably stored, as discussed in
> the documentation, before claiming a backup is valid, and there's
> basically no reason not to unless the tool you've chosen to use makes
> that particularly difficult (even if not *technically* impossible, given
> enough effort). If your solution is built on the assumption that WAL
> archiving is always working and there's no check happening during backup
> to verify that you've got all the WAL then I have serious doubts about
> it being reliable. If you're independently monitoring that all WAL has
> been archived, that's certainly helpful, but I don't consider that to be
> a complete substitute for making sure that you've got all of the WAL for
> a given backup.
>
>> Your entire argument seems about whether said WAL is fsync'ed to
>> disk, and how this is impossible to ensure in a shell script.
> [...]
>> So it is clearly *possible*.
> Yes, it's possible, but it's not something I'd recommend doing and none
> of your arguments have made me any more likely to recommend trying to
> ensure a proper backup has completed using shell scripts. What I fail
> to understand is your insistence on it being a good idea. I've seen
> lots and lots of attempts at it, even made some myself, and have come to
> the generally agreed upon conclusion that it's both a bad idea to hack
> together your own backup solution for PG and that, even if you do want
> to try, using shell scripts to attempt to accomplish it is a bad idea.
> There's much better solutions out there which are really what folks
> should be using. I'm not against using pg_basebackup either, but if
> you're using it, let it handle the archiving because it does verify that
> all of the WAL has been archived properly.
>
>> Actually I was helping him get a *reliable* backup system, I think
>> you misunderstood how swift changes the picture compared to a single
>> server/single disk design.

Ok, so I think we have moved closer to seeing each other's point of view
- been an interesting discussion so far!

> I do understand the goals of things like swift and s3 and the intent
> behind them to provide a better store than local disks, and I'm not
> against using them, to be clear, but they only address one of the
> requirements that I outlined for a reliable backup solution. I mention
> both requirements consistently to, hopefully, ensure that those coming
> along later to read these threads remember that it's more than just
> making sure that you verify all the WAL has been archived during a
> backup- but that they've been archived and actually fsync'd or written
> out to reliable storage.
>
>

Here I think you have still not grasped that (e.g) swift achieves *both*
of these - without you attempting to call fsync after your uploads. (for
instance in our swift cluster, you would have to have all three data
centers down to lose access to your uploaded WAL...and we run with the
various storage mounted with barrier=on so the files will be there when
the centers return) - note that swift PUT operation  (which is what
upload is doing) does fsync at the end.

regards

Mark

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Laurenz Albe 2017-11-07 07:30:30 Re: Standby wal issue
Previous Message PropAAS DBA 2017-11-07 01:10:14 ERROR: invalid byte sequence for encoding "UTF8": 0x00