Patch: Implement failover on libpq connect level.

From: Victor Wagner <vitus(at)wagner(dot)pp(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Patch: Implement failover on libpq connect level.
Date: 2015-10-14 10:41:51
Message-ID: 20151014104151.GA6744@wagner.pp.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-jdbc

On 2015.08.18 at 07:18:50 +0300, Victor Wagner wrote:

> Rationale
> =========
>
> Since introduction of the WAL-based replication into the PostgreSQL, it is
> possible to create high-availability and load-balancing clusters.
>
> However, there is no support for failover in the client libraries. So, only
> way to provide transparent for client application failover is IP address
> migration. This approach has some limitation, i.e. it requires that
> master and backup servers reside in the same subnet or may not be
> feasible for other reasons.
>
> Commercial RDBMS, such as Oracle, employ more flexible approach. They
> allow to specify multiple servers in the connect string, so if primary
> server is not available, client library tries to connect to other ones.
>
> This approach allows to use geographically distributed failover clusters
> and also is a cheap way to implement load-balancing (which is not
> possible with IP address migration).
>

Attached patch which implements client library failover and
loadbalancing as was described in the proposal
<20150818041850(dot)GA5092(at)wagner(dot)pp(dot)ru>.

This patch implements following fuctionality:

1. It is allowed to specify several hosts in the connect string, either
in URL-style (separated by comma) or in param=value form (several host
parameters).

2. Each host parameter can be accompanied with port specifier separated
by colon. Port, specified such way takes precedence of port parameter or
default port for this particular host only.

3. There is hostorder parameter with two possible valies 'sequential'
and 'random' (default sequential). 'parallel' hostorder described in the
proposal is not yet implemented in this version of patch.

4. In the URL-style connect string parameter loadBalanceHosts=true is
considered equal to 'hostorder=random' for compatibility with jdbc.

5. Added new parameter readonly=boolean. If this parameter is false (the
default) upon successful connection check is performed that backend is
not in the recovery state. If so, connection is not considered usable
and next host is tried.

6. Added new parameter falover_timeout. If no usable host is found and
this parameter is specified, hosts are retried cyclically until this
timeout expires. It gives some time for claster management software to
promote one of standbys to master if master fails. By default there is
no retries.

Some implementation notes:

1. Field pghost in the PGconn structure now stores comma separated list
of hosts, which is parsed in the connectDBStart. So, expected results of
some tests in src/interfaces/libpq/test are changed.

2. Url with colon but no port number after the host no more considered
valid.

3. With hostorder=random we have to seed libc random number gernerator.
Some care was taken to not to lose entropy if generator was
initialized by app before connection, and to ensure different random
values if several connections are made from same application in one
second (even in single thread). RNG is seeded by xor of current time,
random value from this rng before seeding (if it was seeded earlier, it
keeps entropy) and address of the connection structure. May be it worth
effort adding thread id or pid, but there is no portable way to doing
so, so it would need testing on all supported platform.

Attachment Content-Type Size
libpq-failover-1.patch text/x-diff 29.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Masahiko Sawada 2015-10-14 11:24:01 Re: Support for N synchronous standby servers - take 2
Previous Message Andres Freund 2015-10-14 10:30:13 Re: Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files

Browse pgsql-jdbc by date

  From Date Subject
Next Message Shulgin, Oleksandr 2015-10-14 12:47:46 Re: Patch: Implement failover on libpq connect level.
Previous Message Kohei Nozaki 2015-10-13 13:04:52 Re: AbstractJdbc2ResultSet.FAST_NUMBER_FAILED brings class loader leak