Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
Cc: Feike Steenbergen <feikesteenbergen(at)gmail(dot)com>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: BUG #13368: standby cluster immediately promotes after pg_basebackup from previously promoted master
Date: 2015-07-03 12:26:28
Message-ID: CAHGQGwH0qnvHqxSF6LUXmMCLQxT8NtZHsJN7ij0sK=AW1DC3Aw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, Jul 3, 2015 at 4:03 PM, Michael Paquier
<michael(dot)paquier(at)gmail(dot)com> wrote:
> On Thu, Jul 2, 2015 at 10:00 PM, Fujii Masao wrote:
>>> At the beginning of
>>> StartupXLOG() before parsing recovery.conf we could check for the
>>> existence of promotion trigger files and unlink them unconditionally.
>>
>> There seems to be still race condition: postmaster can receive SIGUSR1
>> before the startup process removes the promotion trigger file. Then
>> the variable promote_triggered can be set to true unexpectedly.
>
> Ah, yes. That's indeed possible.
>
>> So, what about making postmaster remove the trigger file unconditionally
>> before the startup process is forked, instead? For example, just after
>> PostmasterMain() calls RemovePgTempFiles()?
>
> Sounds fine, no other processes are running at this point.

Thanks for updating the patch!

+ * Skip promote signal files. Those files may have
been created by
+ * pg_ctl to trigger a promotion but they do not
belong to a base
+ * backup.
+ */
+ if (strcmp(de->d_name, PROMOTE_SIGNAL_FILE) == 0 ||
+ strcmp(de->d_name, FALLBACK_PROMOTE_SIGNAL_FILE) == 0)
+ continue;

Do we really want this? This is not required because the files are
always removed at the server startup. Also the backup performance
gain by skipping them would be almost zero since their size is zero.
That is, no need to add this for performance. OTOH, it's harmless
to add this. But I don't want to enlarge the "skip file list" for
pg_basebackup beyond necessary. So I removed this from the patch.

BTW, if we really want this, not only pg_basebackup but also pg_rewind
would need to skip the files.

+ * Remove any trigger file able to trigger promotion, this is safe from
+ * race conditions involving signals as no other processes are running
+ * yet.
+ */
+ RemoveStandbyTrigger();

I changed the function name to RemovePromoteSignalFiles().

I added more detail comment about the race condition that we're
talking about, into here.

Updated version of the patch attached.

Regards,

--
Fujii Masao

Attachment Content-Type Size
20150703_ignore_promote_file_v4_fujii.patch text/x-patch 2.2 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2015-07-03 12:56:43 Re: BUG #13126: table constraint loses its comment
Previous Message olivier.gosseaume 2015-07-03 09:02:17 BUG #13484: Performance problem with logical decoding