Re: Better Upgrades

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: David Fetter <david(at)fetter(dot)org>
Cc: PostgreSQL Development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Better Upgrades
Date: 2018-02-06 00:45:37
Message-ID: CAMsr+YHYMfq7f36nsObuzrnOUKGSyGtieh7rb36-+vP9AHw=7g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 6 February 2018 at 08:09, David Fetter <david(at)fetter(dot)org> wrote:

> Folks,
>
> While chatting with Bruce about how to make something better than
> pg_upgrade, we (and by "we," I mean mostly Bruce) came up with the
> following.
>
> What needs improvement:
>
> - pg_upgrade forces a down time event, no matter how cleverly it's done.
> - pg_upgrade is very much a blocker for on-disk format changes.
>
> The proposal:
>
> - Add a new script--possibly Perl or Bash, which would:
> - Initdb a new cluster with the new version of PostgreSQL and a
> different port.
> - Start logical replication from the old version to the new
> version.
> - Poll until a pre-determined default amount of replication lag was
> observed, then:
> * Issue an ALTER SYSTEM on the new server to change its port to the
> old server's
> * Issue a pg_ctl stop -w to the old server
> * Issue a pg_ctl restart on the new server
> * Happiness!
>
> Assumptions underlying it:
>
> - Disk and similar resources are cheap enough for most users that
> doubling up during the upgrade is feasible.
> - The default upgrade path should require exactly one step.
> - Errors do not, by and large, have the capacity to violate an SLA.
>
> The proposal has blockers:
>
> - We don't actually have logical decoding for DDL, although I'm given
> to understand that Álvaro Herrera has done some yeoman follow-up
> work on Dimitri Fontaine's PoC patches.
>

Yep, some DDL support would be key. Lots of apps expect to do DDL online
(automatic migrations, etc), and after all it's been one of Pg's selling
points for a long time. Not having it means people have to do much more
prep - and *will* ignore that prep and break things if you expose it as an
easy to use tool.

You will find dealing with DDL that does full table rewrites to be a
challenge. Especially ALTER TABLEs that add a DEFAULT using a non-IMMUTABLE
expression and ALTER TYPE ... USING with a non-IMMUTABLE expression. But
even for the immutable cases some work is needed to let us cleanly
replicate the DDL.

Some DDL may initially need to be disallowed because it plays poorly with
logical decoding, e.g. CREATE INDEX CONCURRENTLY. It's fixable, it just
requires special case handling where the apply side understands that
specific statement and knows how to recover if we have an error partway
through applying it.

We (2ndQuadrant) routinely do online migrations like this using pglogical,
which is a much more capable tool than in-core logical replication
currently is. But it still requires careful preparation and application
qualification, and preferably a trial run. Improvements in schema change /
DDL handling would be needed to address that.

Logical decoding doesn't support a number of postgres features, which still
rules its use out for some customers. No sequence support (though pglogical
works around that with periodic sequence sync). No pg_largeobject support.
Probably more I'm forgetting.

Long txns and txns that change huge numbers of rows, especially if they
also dirty the catalogs (even temp tables!) are a challenge. Performance
isn't great for them, but more importantly we don't start decoding them
until they commit so the replication latency for them can be very large.
Care is required to time a cutover so you don't land up stopping writes to
the old db then waiting ages before the new db is caught up.

> - We don't have logical decoding for DCL (GRANT/REVOKE)
>

In general there are issues with any command affecting the cluster as a
whole: capturing them, ensuring replay order between them if run in
different dbs, etc.

The simplest option would be to disallow them during migration (make them
ERROR). Or possibly restrict them to a single DB where we permit them and
add them to a DDL replication queue.

If you let them run in any db and capture them wherever they ran you run
into order-of-apply issues because the logical change streams aren't
synchronised between the DBs.

>
> We also came up with and, we believe, addressed an important issue,
> namely how to ensure continuity. When we issue a `pg_ctl stop -w`,
> that's short for "Cancel current commands and stop cleanly." At this
> point, the new server will not have WAL to replay, so a pg_ctl restart
> will load the new configuration and come up pretty much immediately,
> and the next try will find a brand new server without a down time
> event.

You'll need to be able to make sure the old server stays up after the last
user commit for long enough to flush all pending WAL to the new server.
There may be committed changes that the new server hasn't applied yet.

If the old server crashes during the grace period you'll have to restart it
and retry, since you don't know for sure if the new server got all the
changes.

You'll want some kind of read-only mode where you can ensure that exactly
one server is writeable by user queries at a time. And to prevent conflicts
there'll be some period where both are read-only during the final catchup
phase; you have to make the old server read-only, wait until the new server
applies pending changes, then make the new server read/write. Some care
about when that's done will minimise the write-downtime. And users who're
willing to accept conflicts can always force read/write earlier on the new
server. But only if they're not using SEQUENCEs.

The cut-over period is needed to make sure all sequences are fully synced
over too.

And if 2PC is in use, the cutover must be done during a period where there
are no prepared txns on the old master, since we don't currently decode and
send txns until COMMIT PREPARED time. So we'd lose
prepared-but-not-committed txns if we cut over while they existed.

--
Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-02-06 01:18:55 Re: [HACKERS] [PATCH] Lockable views
Previous Message Tomas Vondra 2018-02-06 00:25:00 Re: WIP: BRIN multi-range indexes