Re: Reload configuration more frequently in apply worker.

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reload configuration more frequently in apply worker.
Date: 2023-05-17 03:04:42
Message-ID: CAA4eK1KRHY8S51u6-JR9riYdCpsAK-YBnKwysmqGKctY3=89Ag@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, May 17, 2023 at 7:18 AM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> Currently, the main loop of apply worker looks like below[1]. Since there are
> two loops, the inner loop will keep receiving and applying message from
> publisher until no more message left. The worker only reloads the configuration in
> the outer loop. This means if the publisher keeps sending messages (it could
> keep sending multiple transactions), the apply worker won't get a chance to
> update the GUCs.
>

Apart from that, I think in rare cases, it seems possible that after
the apply worker has waited for the data and just before it receives
the new replication data/message, the reload happens, then it won't
get a chance to process the reload before processing the new message.
I think such a theory can explain the rare BF failure you pointed out
later in the thread. Does that make sense?

> [1]
> for(;;) /* outer loop */
> {
> for(;;) /* inner loop */
> {
> len = walrcv_receive()
> if (len == 0)
> break;
> ...
> apply change
> }
>
> ...
> if (ConfigReloadPending)
> {
> ConfigReloadPending = false;
> ProcessConfigFile(PGC_SIGHUP);
> }
> ...
> }
>
> I think it would be better that the apply worker can reflect user's
> configuration changes sooner. To achieve this, we can add one more
> ProcessConfigFile() call in the inner loop. Attach the patch for the same. What
> do you think ?
>

I think it appears to somewhat match what Tom said in the third point
in his email [1].

> BTW, I saw one BF failure[2] (it's very rare and only happened once in 4
> months) which I think is due to the low frequent reload in apply worker.
>
> The attached tap test shows how the failure happened.
>

I haven't yet tried to reproduce it but will try later sometime.
Thanks for your analysis.

[1] - https://www.postgresql.org/message-id/2138662.1623460441%40sss.pgh.pa.us

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2023-05-17 03:20:22 Re: Schema variables - new implementation for Postgres 15
Previous Message John Naylor 2023-05-17 02:54:55 Re: cutting down the TODO list thread