Quick Links

Re: Determine state of cluster (HA)

From:	"Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>
To:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Determine state of cluster (HA)
Date:	2017-10-13 17:13:34
Message-ID:	2f87424f-fa59-6532-a6af-10593359a92a@commandprompt.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 10/12/2017 05:50 PM, Joshua D. Drake wrote:
> -Hackers,

Bumping this.

>
> I had a long call with a firm developing front end proxy/cache/HA for
> Postgres today. Essentially the software is a replacement for PGPool in
> entirety but also supports analytics etc... When I was asking them about
> pain points they talked about the below and I was wondering if this is a
> problem we would like to solve:
>
> Per your request, here is our failover issue.
>
> 1. In a modern devops environment, the database should be able to scale
> and morph over time based on need.
> 2. Tools that are leveraging the database should be able to easily
> discover and potentially control (with permissions) the database.
> Currently, you can discover the master and what nodes are syncing off of
> it, but on a failure, a tool can't easily discover what orchestration
> has done on the back-end to make the cluster whole again, i.e. from the
> slave, you can't discover the master reliably and easily.
>
> The logic that our code now uses is to:
>
> 1. Find the master
> 2. Add replication nodes per the master's configuration.
>
> To find a master, we start with a list of candidate nodes that MAY be a
> master at any point, and:
> 1. issue "SELECT pg_is_in_recovery()" to find if it is a slave
> a. If so, use "SELECT pg_read_file('recovery.conf')" to extract the host
> b. Attempt to connect to the host directly, if not...
> c. use the slave and use the hostname via dblink to connect to the
> master, as the hostname , i.e. select * from dblink('" + connInfo + "
> dbname=postgres', 'select inet_server_addr()') AS t(inet_server_addr
> inet). This is necessary in the event the hostname used in the
> recovery.conf file is not resolvable from the outside.
> d. Use the dblink connection to ID the master node via select
> inet_server_addr();
> e. connect to the IP provided by the master.
> f. Repeat through nodes until we get a master.
>
> Issues:
> 1. The dblink call doesn't have a way to specify a timeout, so we have
> to use Java futures to control how long this may take to a reasonable
> amount of time;
> 2. NAT mapping may result in us detecting IP ranges that are not
> accessible to the application nodes.
> 3. there is no easy way to monitor for state changes as they happen,
> allowing faster failovers, everything has to be polled based on events;
> 4. It doesn't support cascading replication very well, although we
> could augment the logic to allow us to map the relationship between nodes.
> 5. There is no way to connect to a db node with something akin to
> SQL-Server's "application intent" flags, to allow a connection to be
> rejected if we wish it to be a read/write connection. This helps detect
> the state of the node directly without having to ask any further
> questions of the node, and makes it easier to "stall" during connection
> until a proper connection can be made.
> 6. The master, on shutdown, will not actually close and disable
> connections as it shuts down, instead, it will issue an error that it is
> shutting down as it does so.
>
> Fundamentally, the biggest issue is that it is very hard to determine
> the state of the cluster by asking all the nodes, in particular in the
> case of a failure. Some state information is lost that is necessary to
> talk to the cluster moving forward in a reliable manner.
>
>

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL Centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://pgconf.us
***** Unless otherwise stated, opinions are my own. *****

In response to

Determine state of cluster (HA) at 2017-10-13 00:50:40 from Joshua D. Drake

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2017-10-13 17:38:47	Re: Improve catcache/syscache performance.
Previous Message	Robert Haas	2017-10-13 17:11:28	Re: pgsql: Avoid coercing a whole-row variable that is already coerced