Re: speed up a logical replica setup

From: "Euler Taveira" <euler(at)eulerto(dot)com>
To: "Shlok Kyal" <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "vignesh C" <vignesh21(at)gmail(dot)com>, "Michael Paquier" <michael(at)paquier(dot)xyz>, "Peter Eisentraut" <peter(at)eisentraut(dot)org>, "Andres Freund" <andres(at)anarazel(dot)de>, "Ashutosh Bapat" <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, "Amit Kapila" <amit(dot)kapila16(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-02-23 02:45:36
Message-ID: bb7ab653-0b9f-4247-8a6c-d3f113343702@app.fastmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 21, 2024, at 5:00 AM, Shlok Kyal wrote:
> I found some issues and fixed those issues with top up patches
> v23-0012 and v23-0013
> 1.
> Suppose there is a cascade physical replication node1->node2->node3.
> Now if we run pg_createsubscriber with node1 as primary and node2 as
> standby, pg_createsubscriber will be successful but the connection
> between node2 and node3 will not be retained and log og node3 will
> give error:
> 2024-02-20 12:32:12.340 IST [277664] FATAL: database system
> identifier differs between the primary and standby
> 2024-02-20 12:32:12.340 IST [277664] DETAIL: The primary's identifier
> is 7337575856950914038, the standby's identifier is
> 7337575783125171076.
> 2024-02-20 12:32:12.341 IST [277491] LOG: waiting for WAL to become
> available at 0/3000F10
>
> To fix this I am avoiding pg_createsubscriber to run if the standby
> node is primary to any other server.
> Made the change in v23-0012 patch

IIRC we already discussed the cascading replication scenario. Of course,
breaking a node is not good that's why you proposed v23-0012. However,
preventing pg_createsubscriber to run if there are standbys attached to it is
also annoying. If you don't access to these hosts you need to (a) kill
walsender (very fragile / unstable), (b) start with max_wal_senders = 0 or (3)
add a firewall rule to prevent that these hosts do not establish a connection
to the target server. I wouldn't like to include the patch as-is. IMO we need
at least one message explaining the situation to the user, I mean, add a hint
message. I'm resistant to a new option but probably a --force option is an
answer. There is no test coverage for it. I adjusted this patch (didn't include
the --force option) and add a test case.

> 2.
> While checking 'max_replication_slots' in 'check_publisher' function,
> we are not considering the temporary slot in the check:
> + if (max_repslots - cur_repslots < num_dbs)
> + {
> + pg_log_error("publisher requires %d replication slots, but
> only %d remain",
> + num_dbs, max_repslots - cur_repslots);
> + pg_log_error_hint("Consider increasing max_replication_slots
> to at least %d.",
> + cur_repslots + num_dbs);
> + return false;
> + }
> Fixed this in v23-0013

Good catch!

Both are included in the next patch.

--
Euler Taveira
EDB https://www.enterprisedb.com/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message shveta malik 2024-02-23 03:05:44 Re: Synchronizing slots from primary to standby
Previous Message Peter Smith 2024-02-23 02:27:51 Re: Add publisher and subscriber to glossary documentation.