Re: Patroni vs pgpool II

From: Jehan-Guillaume de Rorthais <jgdr(at)dalibo(dot)com>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: inzamam(dot)shafiq(at)hotmail(dot)com, cyberdemn(at)gmail(dot)com, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Patroni vs pgpool II
Date: 2023-04-07 14:33:42
Message-ID: 20230407163342.28dfb304@karst
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 07 Apr 2023 21:16:04 +0900 (JST)
Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:

> >> If node 1 hangs and once it is recognized as "down" by other nodes, it will
> >> not be used without manual intervention. Thus the disaster described above
> >> will not happen in pgpool.
> >
> > Ok, so I suppose **all** connections, scripts, softwares, backups,
> > maintenances and admins must go through Pgpool to be sure to hit the
> > correct primary.
> >
> > This might be acceptable in some situation, but I wouldn't call that an
> > anti-split-brain solution. It's some kind of «software hiding the rogue node
> > behind a curtain and pretend it doesn't exist anymore»
>
> You can call Pgpool-II whatever you like.

I didn't mean to be rude here. Please, accept my apologies if my words offended
you.

I consider "proxy-based" fencing architecture fragile because you just don't
know what is happening on your rogue node as long as a meatware is coming along
to deal with it. Moreover, you must trust your scripts, configurations,
procedures, admins, applications, users, replication, network, Pgpool, etc to
not fail on you in the meantime...

In the Pacemaker world, where everything MUST be **predictable**, the only way
to predict the state of a rogue node is to fence it from the cluster. Either cut
it from the network, shut it down or set up the watchdog so it reset itself if
needed. At the end, you know your old primary is off or idle or screaming in
the void with no one to hear it. It can't harm your other nodes, data or apps
anymore, no matter what.

> Important thing for me (and probably for users) is, if it can solve user's
> problem or not.

In my humble (and biased) opinion, Patroni, PAF or shared storage cluster are
solving user's problem in regard with HA. All with PROs and CONs. All rely on
strong, safe, well known and well developed clustering concepts.

Some consider they are complex pieces of software to deploy and maintain, but
this is because HA is complex. No miracle here.

Solutions like Pgpool or Repmgr are trying hard to re-implement HA concepts
but left most of this complexity and safety to the user discretion.
Unfortunately, this is not the role of the user to deal with such things. This
kind of architecture probably answer a need, a gray zone, where it is good
enough. I've seen similar approach in the past with pgbouncer + bash scripting
calling themselves "fencing" solution [1]. I'm fine with it as far as people
are clear about the limitations.

Kind regards,

[1] eg.
https://www.postgresql.eu/events/pgconfeu2016/sessions/session/1348-ha-with-repmgr-barman-and-pgbouncer/

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-04-07 15:47:54 Re: [PATCH] Introduce array_shuffle() and array_sample()
Previous Message Peter J. Holzer 2023-04-07 14:10:20 Re: "PANIC: could not open critical system index 2662" - twice