Logical replication from HA cluster

From: Andrey Borodin <x4mmm(at)yandex-team(dot)ru>
To: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Cc: tserakhau(at)yandex-team(dot)ru
Subject: Logical replication from HA cluster
Date: 2020-10-23 12:30:40
Message-ID: 5275E06F-A746-4833-940A-29D3119CCDFC@yandex-team.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

I'm working on providing smooth failover to a CDC system in HA cluster.
Currently, we do not replicate logical slots and when we promote a replica. This renders impossible continuation of changed data capture (CDC) from new primary after failover.

We cannot start logical replication from LSN different from LSN of a slot. And cannot create a slot on LSN in the past, particularly before or right after promotion.

This leads to massive waste of network bandwidth in our installations, due to necessity of initial table sync.

We are considering to use the extension that creates replication slot with LSN in the past [0]. I understand that there might be some caveats with logical replication, but do not see scale of possible implications of this approach. User get error if WAL is rotated or waits if LSN is not reached yet, this seems perfectly fine for us. In most of our cases when CDC agent detects failover and goes to new primary there are plenty of old WALs to restart CDC.

Are there strong reasons why we do not allow creation of slots with given LSNs, possibly within narrow LSN range (but wider that just GetXLogInsertRecPtr())?

Thanks!

Best regards, Andrey Borodin.

[0] https://github.com/x4m/pg_tm_aux/blob/master/pg_tm_aux.c#L74-L77

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2020-10-23 12:35:23 Re: git clone failed in windows
Previous Message Sridhar N Bamandlapally 2020-10-23 12:25:53 Re: git clone failed in windows