Re: WIP: Failover Slots

From: Craig Ringer <craig(at)2ndquadrant(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: WIP: Failover Slots
Date: 2016-04-06 13:15:38
Message-ID: CAMsr+YEVW-_wivY8yJQCTiJ20u7V4P_xNWNx55MaJ70mLS7O6g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

A few thoughts on failover slots vs the alternative of pushing catalog_xmin
up to the master via a replica's slot and creating independent slots on
replicas.

Failover slots:
---

+ Failover slots are very easy for applications. They "just work" and are
transparent for failover. This is great especially for things that aren't
complex replication schemes, that just want to use logical decoding.

+ Applications don't have to know what replicas exist or be able to reach
them; transparent failover is easier.

- Failover slots can't be used from a cascading standby (where we can fail
down to the standby's own replicas) because they have to write WAL to
advance the slot position. They'd have to send the slot position update
"up" to the master then wait to replay it. Not a disaster, though they'd do
extra work on reconnect until a restart_lsn update replayed. Would require
a whole new feedback-like message on the rep protocol, and couldn't work at
all with archive replication. Ugly as hell.

+ Failover slots exist now, and could be added to 9.6.

- The UI for failover slots can't be re-used for the catalog_xmin push-up
approach to allow replay from failover slots on cascading standbys in 9.7+.
There'd be no way to propagate the creation of failover slots "down" the
replication heirarchy that way, especially to archive standbys like
failover slots will do. So it'd be semantically different and couldn't
re-use the FS UI. We'd be stuck with failover slots even if we also did the
other way later.

+ Will work for recovery of a master PITR-restored up to the latest
recovery point

Independent slots on replicas + catalog_xmin push-up
---

With this approach we allow creation of replication slots on a replica
independently of the master. The replica is required to connect to the
master via a slot. We send feedback to the master to advance the replica's
slot on the master to the confirmed_lsn of the most-behind slot on the
replica, therefore pinning master's catalog_xmin where needed. Or we just
send a new feedback message type that directly sets a catalog_xmin on the
replica's physical slot in the master. Slots are _not_ cloned from master
to replica automatically.

- More complicated for applications to use. They have to create a slot on
each replica that might be failed over to as well as the master and have to
advance all those slots to stop the master from suffering severe catalog
bloat. (But see note below).

- Applications must be able to connect to failover-candidate standbys and
know where they are, it's not automagically handled via WAL. (But see note
below).

- Applications need reconfiguration whenever a standby is rebuilt, moved,
etc. (But see note below).

- Cannot work at all for archive-based replication, requires a slot from
replica to master.

+ Works with replay from cascading standbys

+ Actually solves one of the problems making logical slots on standbys
unsupported at the moment by giving us a way to pin the master's
catalog_xmin to that needed by a replica.

- Won't work for a standby PITR-restored up to latest.

- Vapourware with zero hope for 9.6

Note: I think the application complexity issues can be solved - to a degree
- by having the replicas run a bgworker based helper that connects to the
master and clones the master's slots then advances them automatically.

Do nothing
---

Drop the idea of being able to follow physical failover on logical slots.

I've already expressed why I think this is a terrible idea. It's hostile to
application developers who'd like to use logical decoding. It makes
integration of logical replication with existing HA systems much harder. It
means we need really solid, performant, well-tested and mature logical rep
based HA before we can take logical rep seriously, which is a long way out
given that we can't do decoding of in-progress xacts, ddl, sequences, ....
etc etc.

Some kind of physical HA for logical slots is needed and will be needed for
some time. Logical rep will be great for selective replication, replication
over WAN, filtered/transformed replication etc. Physical rep is great for
knowing you'll get exactly the same thing on the replica that you have on
the master and it'll Just Work.

In any case, "Do nothing" is the same for 9.6 as pursusing the catalog_xmin
push-up idea; in both cases we don't commit anything in 9.6.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Anastasia Lubennikova 2016-04-06 13:15:42 Re: WIP: Covering + unique indexes.
Previous Message Simon Riggs 2016-04-06 13:00:20 Re: [COMMITTERS] pgsql: Avoid archiving XLOG_RUNNING_XACTS on idle server