Re: Patroni vs pgpool II

From: Ron <ronljohnsonjr(at)gmail(dot)com>
To: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Patroni vs pgpool II
Date: 2023-04-07 11:12:22
Message-ID: c8016b52-f6c0-d1d9-ee76-8ed22cbfad12@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 4/7/23 05:46, Jehan-Guillaume de Rorthais wrote:
> On Fri, 07 Apr 2023 18:04:05 +0900 (JST)
> Tatsuo Ishii<ishii(at)sraoss(dot)co(dot)jp> wrote:
>
>>> And I believe that's part of what Cen was complaining about:
>>>
>>> «
>>> It is basically a daemon glued together with scripts for which you are
>>> entirely responsible for. Any small mistake in failover scripts and
>>> cluster enters  a broken state.
>>> »
>>>
>>> If you want to build something clean, including fencing, you'll have to
>>> handle/dev it by yourself in scripts
>> That's a design decision. This gives maximum flexibility to users.
> Sure, no problem with that. But people has to realize that the downside is that
> it left the whole complexity and reliability of the cluster in the hands of
> the administrator. And these are much more complicated and racy than
> a simple promote node.
>
> Even dealing with a simple vIP can become a nightmare if not done correctly.
>
>> Please note that we provide step-by-step installation/configuration
>> documents which has been used by production systems.
>>
>> https://www.pgpool.net/docs/44/en/html/example-cluster.html
> These scripts rely on SSH, which is really bad. What if you have a SSH failure
> in the mix?
>
> Moreover, even if SSH wouldn't be a weakness by itself, the script it doesn't
> even try to shutdown the old node or stop the old primary.

That does not matter, when only PgPool does the writing to the database.

> You can add to the mix that both Pgpool and SSH rely on TCP for availability
> checks and actions. You better have very low TCP timeout/retry...
>
> When a service lose quorum on a resource, it is supposed to shutdown as fast as
> possible... Or even self-fence itself using a watchdog device if the shutdown
> action doesn't return fast enough.

Scenario:
S0 - Running Postgresql as primary, and also PgPool.
S1 - Running Postgresql as secondary, and also PgPool.
S2 - Running only PgPool.  Has the VIP.

There's no /need/ for Postgresql or PgPool on server 0 to shut down if it
loses contact with S1 and S2, since they'll also notice that that S1 has
disappeared.  In that case, they'll vote S1 into degraded state, and promote
S1 to be the Postgresql primary.

A good question is what happens when S0 and S1 lose connection to S2
(meaning that S2 loses connection to them, too).  S0 and S1 then "should"
vote that S0 take over the VIP.  But, if S2 is still up and can connect to
"the world", does it voluntarily decide to give up the VIP since it's all alone?

--
Born in Arizona, moved to Babylonia.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Paquier 2023-04-07 11:31:45 Re: "PANIC: could not open critical system index 2662" - twice
Previous Message Laurenz Albe 2023-04-07 11:04:34 Re: "PANIC: could not open critical system index 2662" - twice