Here is an updated patch to fix some build failures. No feature changes.
On 14.12.21 23:12, Peter Eisentraut wrote:
> On 31.10.21 11:08, Peter Eisentraut wrote:
>> I want to reactivate $subject. I took Petr Jelinek's patch from [0],
>> rebased it, added a bit of testing. It basically works, but as
>> mentioned in [0], there are various issues to work out.
>>
>> The idea is that the standby runs a background worker to periodically
>> fetch replication slot information from the primary. On failover, a
>> logical subscriber would then ideally find up-to-date replication
>> slots on the new publisher and can just continue normally.
>
>> So, again, this isn't anywhere near ready, but there is already a lot
>> here to gather feedback about how it works, how it should work, how to
>> configure it, and how it fits into an overall replication and HA
>> architecture.
>
> Here is an updated patch. The main changes are that I added two
> configuration parameters. The first, synchronize_slot_names, is set on
> the physical standby to specify which slots to sync from the primary. By
> default, it is empty. (This also fixes the recovery test failures that
> I had to disable in the previous patch version.) The second,
> standby_slot_names, is set on the primary. It holds back logical
> replication until the listed physical standbys have caught up. That
> way, when failover is necessary, the promoted standby is not behind the
> logical replication consumers.
>
> In principle, this works now, I think. I haven't made much progress in
> creating more test cases for this; that's something that needs more
> attention.
>
> It's worth pondering what the configuration language for
> standby_slot_names should be. Right now, it's just a list of slots that
> all need to be caught up. More complicated setups are conceivable.
> Maybe you have standbys S1 and S2 that are potential failover targets
> for logical replication consumers L1 and L2, and also standbys S3 and S4
> that are potential failover targets for logical replication consumers L3
> and L4. Viewed like that, this setting could be a replication slot
> setting. The setting might also have some relationship with
> synchronous_standby_names. Like, if you have synchronous_standby_names
> set, then that's a pretty good indication that you also want some or all
> of those standbys in standby_slot_names. (But note that one is slots
> and one is application names.) So there are a variety of possibilities.