Re: allow online change primary_conninfo

From: Sergei Kornilov <sk(at)zsrv(dot)org>
To: Michael Paquier <michael(at)paquier(dot)xyz>, Andres Freund <andres(at)anarazel(dot)de>
Cc: PostgreSQL mailing lists <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: allow online change primary_conninfo
Date: 2018-12-11 08:59:08
Message-ID: 19414261544518748@iva1-a2ffb02749cf.qloud-c.yandex.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

thank you for review!

> but I think that this patch should document clearly that if
> primary_conninfo or primary_slot_name are changed then the WAL receiver
> is stopped immediately.
Good idea, will change.

>     /*
> - * replication slot name; is also used for walreceiver to connect with the
> - * primary
> + * replication slot name used by runned walreceiver
>      */
> Not sure that there is much point in updating those comments, because
> it still has the same meaning in the new context.
"is also used" seems outdated for me. With proposed patch this field means only currently active slot, not "also".
Well, depends on RequestXLogStreaming discussion

> -RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr, const char *conninfo,
> - const char *slotname)
> +RequestXLogStreaming(TimeLineID tli, XLogRecPtr recptr)
> That's perhaps a matter of taste, but keeping the current API as-is
> looks cleaner to me, particularly because those fields get copied into
> WalRcv, so I'd rather not change what is done in
> WaitForWALToBecomeAvailable and keep the patch at its simplest shape.
Unfortunally here is race condition if i leave RequestXLogStreaming infrastructure as-is and walreceiver will use this struct for startup.
This case:
- startup process calls RequestXLogStreaming and fill WalRcv conninfo/slotname
- reload primary_conninnfo by SIGHUP
- walreceiver start with new GUC but trying conninfo/slotname from WalRcv struct. If walreceiver is able to connect - it will not restart till another error or primary_conninfo/slotname will be changed again (SIGHUP without change is not enough).

I already had such failures while testing this patch.

If walreceiver will use primary_conninfo GUC at startup - i see no reason to leave *conninfo in RequestXLogStreaming. These parameters will be misleading, we request them, but not using for anything.

> +$node_standby_2->reload; # should have effect without restart
> This does not wait for the change to be effective, so I think that you
> introduce a race condition for slow machines with this test, making it
> unstable.
No, here is no race condition because just after this line we wait this replication slot in upstream pg_catalog.pg_replication_slots (get_slot_xmins routine).
Before reload we have no replication slot and should reconnect here with replication slot. I can add some comment here.

regards, Sergei

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2018-12-11 09:37:28 Re: Pluggable Storage - Andres's take
Previous Message Peter Eisentraut 2018-12-11 08:22:48 Re: [HACKERS] Can ICU be used for a database's default sort order?