Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-09-09 14:11:43
Message-ID: CAHGQGwHoHNmZGZ4UXta8UtxXUu0-BpLk-qNVyiNs0ONX2mnB+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 3, 2015 at 9:26 PM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
> On Fri, Jul 3, 2015 at 4:03 PM, Michael Paquier
> <michael(dot)paquier(at)gmail(dot)com> wrote:
>> On Thu, Jul 2, 2015 at 10:00 PM, Fujii Masao wrote:
>>>> At the beginning of
>>>> StartupXLOG() before parsing recovery.conf we could check for the
>>>> existence of promotion trigger files and unlink them unconditionally.
>>>
>>> There seems to be still race condition: postmaster can receive SIGUSR1
>>> before the startup process removes the promotion trigger file. Then
>>> the variable promote_triggered can be set to true unexpectedly.
>>
>> Ah, yes. That's indeed possible.
>>
>>> So, what about making postmaster remove the trigger file unconditionally
>>> before the startup process is forked, instead? For example, just after
>>> PostmasterMain() calls RemovePgTempFiles()?
>>
>> Sounds fine, no other processes are running at this point.
>
> Thanks for updating the patch!
>
> + * Skip promote signal files. Those files may have
> been created by
> + * pg_ctl to trigger a promotion but they do not
> belong to a base
> + * backup.
> + */
> + if (strcmp(de->d_name, PROMOTE_SIGNAL_FILE) == 0 ||
> + strcmp(de->d_name, FALLBACK_PROMOTE_SIGNAL_FILE) == 0)
> + continue;
>
> Do we really want this? This is not required because the files are
> always removed at the server startup. Also the backup performance
> gain by skipping them would be almost zero since their size is zero.
> That is, no need to add this for performance. OTOH, it's harmless
> to add this. But I don't want to enlarge the "skip file list" for
> pg_basebackup beyond necessary. So I removed this from the patch.
>
> BTW, if we really want this, not only pg_basebackup but also pg_rewind
> would need to skip the files.
>
> + * Remove any trigger file able to trigger promotion, this is safe from
> + * race conditions involving signals as no other processes are running
> + * yet.
> + */
> + RemoveStandbyTrigger();
>
> I changed the function name to RemovePromoteSignalFiles().
>
> I added more detail comment about the race condition that we're
> talking about, into here.
>
> Updated version of the patch attached.

Pushed.

Regards,

--
Fujii Masao

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message kondo 2015-09-10 04:15:18 BUG #13611: test_postmaster_connection failed (Windows, listen_addresses = '0.0.0.0' or '::')
Previous Message hikkis21c 2015-09-09 08:08:47 BUG #13610: upgrade fail