Re: speed up a logical replica setup

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Euler Taveira <euler(at)eulerto(dot)com>
Cc: "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>, Shlok Kyal <shlok(dot)kyal(dot)oss(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, "pgsql-hackers(at)lists(dot)postgresql(dot)org" <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(dot)oss(at)gmail(dot)com>, Fabrízio de Royes Mello <fabriziomello(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>
Subject: Re: speed up a logical replica setup
Date: 2024-04-29 10:58:17
Message-ID: CAA4eK1LT3Z13Dg6p4Z+4adO_EY-ow5CmWfikEmBfL=eVrm8CPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 26, 2024 at 8:24 AM Euler Taveira <euler(at)eulerto(dot)com> wrote:
>
> On Mon, Mar 25, 2024, at 1:06 PM, Hayato Kuroda (Fujitsu) wrote:
>
> The first patch implements a combination of (1) and (2).
>
> ## Analysis for failure 2
>
> According to [2], the physical replication slot which is specified as primary_slot_name
> was not used by the walsender process. At that time walsender has not existed.
>
> ```
> ...
> pg_createsubscriber: publisher: current wal senders: 0
> pg_createsubscriber: command is: SELECT 1 FROM pg_catalog.pg_replication_slots WHERE active AND slot_name = 'physical_slot'
> pg_createsubscriber: error: could not obtain replication slot information: got 0 rows, expected 1 row
> ...
> ```
>
> Currently standby must be stopped before the command and current code does not
> block the flow to ensure the replication is started. So there is a possibility
> that the checking is run before walsender is launched.
>
> One possible approach is to wait until the replication starts. Alternative one is
> to ease the condition.
>
>
> That's my suggestion too. I reused NUM_CONN_ATTEMPTS (that was renamed to
> NUM_ATTEMPTS in the first patch). See second patch.
>

How can we guarantee that the slot can become active after these many
attempts? It still could be possible that on some slow machines it
didn't get activated even after NUM_ATTEMPTS. BTW, in the first place,
why do we need to ensure that the 'primary_slot_name' is active on the
primary?

--
With Regards,
Amit Kapila.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2024-04-29 11:53:19 Re: speed up a logical replica setup
Previous Message Antonin Houska 2024-04-29 10:34:03 Join removal and attr_needed cleanup