Re: WIP: Failover Slots

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Craig Ringer <craig(at)2ndquadrant(dot)com>, Thom Brown <thom(at)linux(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: Failover Slots
Date: 2017-08-12 00:03:19
Message-ID: 20170812000319.4higzrilqyxstbko@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2017-08-02 16:35:17 -0400, Robert Haas wrote:
> I actually think failover slots are quite desirable, especially now
> that we've got logical replication in core. In a review of this
> thread I don't see anyone saying otherwise. The debate has really
> been about the right way of implementing that.

Given that I presumably was one of the people pushing back more
strongly: I agree with that. Besides disagreeing with the proposed
implementation our disagreements solely seem to have been about
prioritization.

I still think we should have a halfway agreed upon *design* for logical
failover, before we introduce a concept that's quite possibly going to
be incompatible with that, however. But that doesn't mean it has to
submitted/merged to core.

> - When a standby connects to a master, it can optionally supply a list
> of slot names that it cares about.
> - The master responds by periodically notifying the standby of changes
> to the slot contents using some new replication sub-protocol message.
> - The standby applies those updates to its local copies of the slots.

> So, you could create a slot on a standby with an "uplink this" flag of
> some kind, and it would then try to keep it up to date using the
> method described above. It's not quite clear to me how to handle the
> case where the corresponding slot doesn't exist on the master, or
> initially does but then it's later dropped, or it initially doesn't
> but it's later created.

I think there's a couple design goals we need to agree upon, before
going into the weeds of how exactly we want this to work. Some of the
axis I can think of are:

- How do we want to deal with cascaded setups, do slots have to be
available everywhere, or not?
- What kind of PITR integration do we want? Note that simple WAL based
slots do *NOT* provide proper PITR support, there's not enough
interlock easily available (you'd have to save slots at the end, then
increment minRecoveryLSN to a point later than the slot saving)
- How much divergence are we going to accept between logical decoding on
standbys, and failover slots. I'm probably a lot closer to closer than
than Craig is.
- How much divergence are we going to accept between infrastructure for
logical failover, and logical failover via failover slots (or however
we're naming this)? Again, I'm probably a lot closer to zero than
craig is.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-08-12 00:17:05 Re: POC: Sharing record typmods between backends
Previous Message Andres Freund 2017-08-11 20:26:29 Re: [postgresql 10 beta3] unrecognized node type: 90