Re: Determine state of cluster (HA)

From: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Determine state of cluster (HA)
Date: 2017-10-13 17:13:34
Message-ID: 2f87424f-fa59-6532-a6af-10593359a92a@commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/12/2017 05:50 PM, Joshua D. Drake wrote:
> -Hackers,

Bumping this.

>
> I had a long call with a firm developing front end proxy/cache/HA for
> Postgres today. Essentially the software is a replacement for PGPool in
> entirety but also supports analytics etc... When I was asking them about
> pain points they talked about the below and I was wondering if this is a
> problem we would like to solve:
>
>  Per your request, here is our failover issue.
>
> 1.  In a modern devops environment, the database should be able to scale
> and morph over time based on need.
> 2.  Tools that are leveraging the database should be able to easily
> discover and potentially control (with permissions) the database.
> Currently, you can discover the master and what nodes are syncing off of
> it, but on a failure, a tool can't easily discover what orchestration
> has done on the back-end to make the cluster whole again, i.e. from the
> slave, you can't discover the master reliably and easily.
>
> The logic that our code now uses is to:
>
> 1.  Find the master
> 2.  Add replication nodes per the master's configuration.
>
> To find a master, we start with a list of candidate nodes that MAY be a
> master at any point, and:
> 1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
> a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
> b. Attempt to connect to the host directly, if not...
> c. use the slave and use the hostname via dblink to connect to the
> master, as the hostname , i.e. select * from dblink('" + connInfo + "
> dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr
> inet).  This is necessary in the event the hostname used in the
> recovery.conf file is not resolvable from the outside.
> d. Use the dblink connection to ID the master node via select
> inet_server_addr();
> e. connect to the IP provided by the master.
> f.  Repeat through nodes until we get a master.
>
> Issues:
> 1.  The dblink call doesn't have a way to specify a timeout, so we have
> to use Java futures to control how long this may take to a reasonable
> amount of time;
> 2.  NAT mapping may result in us detecting IP ranges that are not
> accessible to the application nodes.
> 3.  there is no easy way to monitor for state changes as they happen,
> allowing faster failovers, everything has to be polled based on events;
> 4.  It doesn't support cascading replication very well, although we
> could augment the logic to allow us to map the relationship between nodes.
> 5.  There is no way to connect to a db node with something akin to
> SQL-Server's "application intent" flags, to allow a connection to be
> rejected if we wish it to be a read/write connection.  This helps detect
> the state of the node directly without having to ask any further
> questions of the node, and makes it easier to "stall" during connection
> until a proper connection can be made.
> 6.  The master, on shutdown, will not actually close and disable
> connections as it shuts down, instead, it will issue an error that it is
> shutting down as it does so.
>
> Fundamentally, the biggest issue is that it is very hard to determine
> the state of the cluster by asking all the nodes, in particular in the
> case of a failure.  Some state information is lost that is necessary to
> talk to the cluster moving forward in a reliable manner.
>
>

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-10-13 17:38:47 Re: Improve catcache/syscache performance.
Previous Message Robert Haas 2017-10-13 17:11:28 Re: pgsql: Avoid coercing a whole-row variable that is already coerced