v12 and TimeLine switches and backups/restores

From: Stephen Frost <sfrost(at)snowman(dot)net>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: v12 and TimeLine switches and backups/restores
Date: 2020-07-01 04:12:14
Message-ID: 20200701041214.GM3125@tamriel.snowman.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greetings,

Among the changes made to PG's recovery in v12 was to set
recovery_target_timeline to be 'latest' by default. That's handy when
you're flipping back and forth between replicas and want to have
everyone follow that game, but it's made doing some basic things like
restoring from a backup problematic.

Specifically, if you take a backup off a primary and, while that backup
is going on, some replica is promoted and drops a .history file into the
WAL repo, that backup is no longer able to be restored with the new
recovery_target_timeline default. What happens is that the restore
process will happily follow the timeline change- even though it happened
before we reached consistency, and then it'll never find the needed
end-of-backup WAL point that would allow us to reach consistency.

Naturally, a primary isn't ever going to do a TL switch, and we already
throw an error during an online backup from a replica if that replica
did a TL switch during the backup, to indicate that the backup isn't
valid.

Attached is an initial draft of a patch to at least give a somewhat
clearer error message when we detect that the user has asked us to
follow a timeline switch to a new timeline before we've reached
consistency (though I had to hack in a check to see if pg_rewind is
being used, since apparently it actually depends on PG following a
timeline switch before reaching consistency...).

Thoughts?

Thanks,

Stephen

Attachment Content-Type Size
error-on-TL-switch-backup-end-of-backup_v1.patch text/x-diff 5.3 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2020-07-01 04:43:31 Re: Transactions involving multiple postgres foreign servers, take 2
Previous Message Alvaro Herrera 2020-07-01 03:28:39 Re: Intermittent BRIN failures on hyrax and lousyjack