Re: Sync Rep v19

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Yeb Havinga <yebhavinga(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Sync Rep v19
Date: 2011-03-08 12:05:08
Message-ID: AANLkTiko6-COABo+oVnRJ+t6Vh99FvYAM3Seu30=tnef@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 7, 2011 at 4:54 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Mar 6, 2011, at 9:44 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>> On Sun, Mar 6, 2011 at 5:02 PM, Yeb Havinga <yebhavinga(at)gmail(dot)com> wrote:
>>> On Sun, Mar 6, 2011 at 8:58 AM, Fujii Masao <masao(dot)fujii(at)gmail(dot)com> wrote:
>>>>
>>>> If unfortunately all connection slots are used by backends waiting for
>>>> replication, we cannot execute such a function. So it makes more sense
>>>> to introduce something like "pg_ctl standalone" command?
>>>
>>> If it is only for shutdown, maybe pg_ctl stop -m standalone?
>>
>> It's for not only shutdown but also running the primary in standalone mode.
>> So something like "pg_ctl standalone" is better.
>>
>> For now I think that pg_ctl command is better than built-in function because
>> sometimes we might want to wake waiters up even during shutdown in
>> order to cause shutdown to end. During shutdown, the server doesn't
>> accept any new connection (even from the standby). So, without something
>> like "pg_ctl standalone", there is no way to cause shutdown to end.
>
> This sounds like an awful hack to work around a bad design. Surely once shutdown reaches a point where new replication connections can no longer be accepted, any standbys hung on commit need to close the connection without responding to the COMMIT, per previous discussion.  It's completely unreasonable for sync rep to break the shutdown sequence.

Yeah, let's think about how shutdown should work. I'd like to propose the
following. Thought?

* Smart shutdown
Smart shutdown should wait for all the waiting backends to be acked, and
should not cause them to forcibly exit. But this leads shutdown to get stuck
infinitely if there is no walsender at that time. To enable them to be acked
even in that situation, we need to change postmaster so that it accepts the
replication connection even during smart shutdown (until we reach
PM_SHUTDOWN_2 state). Postmaster has already accepted the superuser
connection to cancel backup during smart shutdown. So I don't think that
the idea to accept the replication connection during smart shutdown is so
ugly.

* Fast shutdown
I agree with you about fast shutdown. Fast shutdown should cause all the
backends including waiting ones to exit immediately. At that time, the
non-acked backend should not return the success, according to the
definition of sync rep. So we need to change a backend so that it gets rid
of itself from the waiting queue and exits before returning the success,
when it receives SIGTERM. This change leads the waiting backends to
do the same even when pg_terminate_backend is called. But since
they've not been acked yet, it seems to be reasonable to prevent them
from returning the COMMIT.

Comments? I'll create the patch barring objection.

Regards,

--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-03-08 13:38:29 Re: Parallel make problem with git master
Previous Message Fujii Masao 2011-03-08 10:48:30 Re: Re: [COMMITTERS] pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,