Re: Support load balancing in libpq

From: Michael Banck <mbanck(at)gmx(dot)net>
To: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
Cc: Maxim Orlov <orlovmg(at)gmail(dot)com>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support load balancing in libpq
Date: 2023-01-06 17:21:23
Message-ID: 63b85894.170a0220.304c7.19a6@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Tue, Nov 29, 2022 at 02:57:08PM +0000, Jelte Fennema wrote:
> Attached is a new version with the tests cleaned up a bit (more
> comments mostly).
>
> @Michael, did you have a chance to look at the last version? Because I
> feel that the patch is pretty much ready for a committer to look at,
> at this point.

I had another look; it still applies and tests pass. I still find the
distribution over three hosts a bit skewed (even when OpenSSL is
enabled, which wasn't the case when I first tested it), but couldn't
find a general fault and it worked well enough in my testing.

I wonder whether making the parameter a boolean will paint us into a
corner, and whether maybe additional modes could be envisioned in the
future, but I can't think of some right now (you can pretty neatly
restrict load-balancing to standbys by setting
target_session_attrs=standby in addition). Maybe a way to change the
behaviour if a dns hostname is backed by multiple entries?

Some further (mostly nitpicking) comments on the patch:

> From 6e20bb223012b666161521b5e7249c066467a5f3 Mon Sep 17 00:00:00 2001
> From: Jelte Fennema <github-tech(at)jeltef(dot)nl>
> Date: Mon, 12 Sep 2022 09:44:06 +0200
> Subject: [PATCH v5] Support load balancing in libpq
>
> Load balancing connections across multiple read replicas is a pretty
> common way of scaling out read queries. There are two main ways of doing
> so, both with their own advantages and disadvantages:
> 1. Load balancing at the client level
> 2. Load balancing by connecting to an intermediary load balancer
>
> Both JBDC (Java) and Npgsql (C#) already support client level load
> balancing (option #1). This patch implements client level load balancing
> for libpq as well. To stay consistent with the JDBC and Npgsql part of
> the ecosystem, a similar implementation and name for the option are
> used.

I still think all of the above has no business in the commit message,
though maybe the first paragraph can stay as introduction.

> It contains two levels of load balancing:
> 1. The given hosts are randomly shuffled, before resolving them
> one-by-one.
> 2. Once a host its addresses get resolved, those addresses are shuffled,
> before trying to connect to them one-by-one.

That's fine.

What should be in the commit message is at least a mention of what the
new connection parameter is called and possibly what is done to
accomplish it.

But the committer will pick this up if needed I guess.

> diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml
> index f9558dec3b..6ce7a0c9cc 100644
> --- a/doc/src/sgml/libpq.sgml
> +++ b/doc/src/sgml/libpq.sgml
> @@ -1316,6 +1316,54 @@ postgresql://%2Fvar%2Flib%2Fpostgresql/dbname
> </listitem>
> </varlistentry>
>
> + <varlistentry id="libpq-load-balance-hosts" xreflabel="load_balance_hosts">
> + <term><literal>load_balance_hosts</literal></term>
> + <listitem>
> + <para>
> + Controls whether the client load balances connections across hosts and
> + adresses. The default value is 0, meaning off, this means that hosts are
> + tried in order they are provided and addresses are tried in the order
> + they are received from DNS or a hosts file. If this value is set to 1,
> + meaning on, the hosts and addresses that they resolve to are tried in
> + random order. Subsequent queries once connected will still be sent to
> + the same server. Setting this to 1, is mostly useful when opening
> + multiple connections at the same time, possibly from different machines.
> + This way connections can be load balanced across multiple Postgres
> + servers.
> + </para>
> + <para>
> + When providing multiple hosts, these hosts are resolved in random order.
> + Then if that host resolves to multiple addresses, these addresses are
> + connected to in random order. Only once all addresses for a single host
> + have been tried, the addresses for the next random host will be
> + resolved. This behaviour can lead to non-uniform address selection in
> + certain cases. Such as when not all hosts resolve to the same number of
> + addresses, or when multiple hosts resolve to the same address. So if you
> + want uniform load balancing, this is something to keep in mind. However,
> + non-uniform load balancing also has usecases, e.g. providing the
> + hostname of a larger server multiple times in the host string so it gets
> + more requests.
> + </para>
> + <para>
> + When using this setting it's recommended to also configure a reasonable
> + value for <xref linkend="libpq-connect-connect-timeout"/>. Because then,
> + if one of the nodes that are used for load balancing is not responding,
> + a new node will be tried.
> + </para>
> + </listitem>
> + </varlistentry>

I think this whole section is generally fine, but needs some
copyediting.

> + <varlistentry id="libpq-random-seed" xreflabel="random_seed">
> + <term><literal>random_seed</literal></term>
> + <listitem>
> + <para>
> + Sets the random seed that is used by <xref linkend="libpq-load-balance-hosts"/>
> + to randomize the host order. This option is mostly useful when running
> + tests that require a stable random order.
> + </para>
> + </listitem>
> + </varlistentry>

I wonder whether this needs to be documented if it is mostly a
development/testing parameter?

> diff --git a/src/include/libpq/pqcomm.h b/src/include/libpq/pqcomm.h
> index fcf68df39b..39e93b1392 100644
> --- a/src/include/libpq/pqcomm.h
> +++ b/src/include/libpq/pqcomm.h
> @@ -27,6 +27,12 @@ typedef struct
> socklen_t salen;
> } SockAddr;
>
> +typedef struct
> +{
> + int family;
> + SockAddr addr;
> +} AddrInfo;

That last line looks weirdly indented compared to SockAddr; in the
struct above.

> /* Configure the UNIX socket location for the well known port. */
>
> #define UNIXSOCK_PATH(path, port, sockdir) \
> diff --git a/src/interfaces/libpq/fe-connect.c b/src/interfaces/libpq/fe-connect.c
> index f88d672c6c..b4d3613713 100644
> --- a/src/interfaces/libpq/fe-connect.c
> +++ b/src/interfaces/libpq/fe-connect.c
> @@ -241,6 +241,14 @@ static const internalPQconninfoOption PQconninfoOptions[] = {
> "Fallback-Application-Name", "", 64,
> offsetof(struct pg_conn, fbappname)},
>
> + {"load_balance_hosts", NULL, NULL, NULL,
> + "Load-Balance", "", 1, /* should be just '0' or '1' */
> + offsetof(struct pg_conn, loadbalance)},
> +
> + {"random_seed", NULL, NULL, NULL,
> + "Random-Seed", "", 10, /* strlen(INT32_MAX) == 10 */
> + offsetof(struct pg_conn, randomseed)},
> +
> {"keepalives", NULL, NULL, NULL,
> "TCP-Keepalives", "", 1, /* should be just '0' or '1' */
> offsetof(struct pg_conn, keepalives)},
> @@ -379,6 +387,7 @@ static bool fillPGconn(PGconn *conn, PQconninfoOption *connOptions);
> static void freePGconn(PGconn *conn);
> static void closePGconn(PGconn *conn);
> static void release_conn_addrinfo(PGconn *conn);
> +static bool store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist);
> static void sendTerminateConn(PGconn *conn);
> static PQconninfoOption *conninfo_init(PQExpBuffer errorMessage);
> static PQconninfoOption *parse_connection_string(const char *connstr,
> @@ -424,6 +433,9 @@ static void pgpassfileWarning(PGconn *conn);
> static void default_threadlock(int acquire);
> static bool sslVerifyProtocolVersion(const char *version);
> static bool sslVerifyProtocolRange(const char *min, const char *max);
> +static int loadBalance(PGconn *conn);
> +static bool parse_int_param(const char *value, int *result, PGconn *conn,
> + const char *context);
>
>
> /* global variable because fe-auth.c needs to access it */
> @@ -1007,6 +1019,46 @@ parse_comma_separated_list(char **startptr, bool *more)
> return p;
> }
>
> +/*
> + * Initializes the prng_state field of the connection. We want something
> + * unpredictable, so if possible, use high-quality random bits for the
> + * seed. Otherwise, fall back to a seed based on timestamp and PID.
> + */
> +static bool
> +libpq_prng_init(PGconn *conn)
> +{
> + if (unlikely(conn->randomseed))
> + {
> + int rseed;
> +
> + if (!parse_int_param(conn->randomseed, &rseed, conn, "random_seed"))
> + {
> + return false;
> + };

I think it's project policy to drop the braces for single statements in
if blocks.

> + pg_prng_seed(&conn->prng_state, rseed);
> + }
> + else if (unlikely(!pg_prng_strong_seed(&conn->prng_state)))
> + {
> + uint64 rseed;
> + time_t now = time(NULL);
> +
> + /*
> + * Since PIDs and timestamps tend to change more frequently in their
> + * least significant bits, shift the timestamp left to allow a larger
> + * total number of seeds in a given time period. Since that would
> + * leave only 20 bits of the timestamp that cycle every ~1 second,
> + * also mix in some higher bits.
> + */
> + rseed = ((uint64) getpid()) ^
> + ((uint64) now << 12) ^
> + ((uint64) now >> 20);
> +
> + pg_prng_seed(&conn->prng_state, rseed);
> + }
> + return true;
> +}
> +
> +
> /*

Additional newline.

> @@ -1164,6 +1217,36 @@ connectOptions2(PGconn *conn)
> }
> }
>
> + if (loadbalancehosts < 0)
> + {
> + appendPQExpBufferStr(&conn->errorMessage,
> + libpq_gettext("loadbalance parameter must be an integer\n"));
> + return false;
> + }
> +
> + if (loadbalancehosts)
> + {
> + if (!libpq_prng_init(conn))
> + {
> + return false;
> + }
> +
> + /*
> + * Shuffle connhost with a Durstenfeld/Knuth version of the
> + * Fisher-Yates shuffle. Source:
> + * https://en.wikipedia.org/wiki/Fisher%E2%80%93Yates_shuffle#The_modern_algorithm
> + */
> + for (i = conn->nconnhost - 1; i > 0; i--)
> + {
> + int j = pg_prng_uint64_range(&conn->prng_state, 0, i);
> + pg_conn_host temp = conn->connhost[j];
> +
> + conn->connhost[j] = conn->connhost[i];
> + conn->connhost[i] = temp;
> + }
> + }
> +
> +
> /*

Additional newline.

> @@ -1726,6 +1809,27 @@ connectFailureMessage(PGconn *conn, int errorno)
> libpq_append_conn_error(conn, "\tIs the server running on that host and accepting TCP/IP connections?");
> }
>
> +/*
> + * Should we load balance across hosts? Returns 1 if yes, 0 if no, and -1 if
> + * conn->loadbalance is set to a value which is not parseable as an integer.
> + */
> +static int
> +loadBalance(PGconn *conn)
> +{
> + char *ep;
> + int val;
> +
> + if (conn->loadbalance == NULL)
> + {
> + return 0;
> + }

Another case of additional braces.

> + val = strtol(conn->loadbalance, &ep, 10);
> + if (*ep)
> + return -1;
> + return val != 0 ? 1 : 0;
> +}
> +
> +
> /*

Additional newline.

> @@ -4041,6 +4154,63 @@ freePGconn(PGconn *conn)
> free(conn);
> }
>
> +
> +/*

Additional newline.

> + * Copies over the AddrInfos from addrlist to the PGconn.
> + */
> +static bool
> +store_conn_addrinfo(PGconn *conn, struct addrinfo *addrlist)
> +{
> + struct addrinfo *ai = addrlist;
> +
> + conn->whichaddr = 0;
> +
> + conn->naddr = 0;
> + while (ai)
> + {
> + ai = ai->ai_next;
> + conn->naddr++;
> + }
> +
> + conn->addr = calloc(conn->naddr, sizeof(AddrInfo));
> + if (conn->addr == NULL)
> + {
> + return false;
> + }

Additional braces.

> @@ -4048,11 +4218,10 @@ freePGconn(PGconn *conn)
> static void
> release_conn_addrinfo(PGconn *conn)
> {
> - if (conn->addrlist)
> + if (conn->addr)
> {
> - pg_freeaddrinfo_all(conn->addrlist_family, conn->addrlist);
> - conn->addrlist = NULL;
> - conn->addr_cur = NULL; /* for safety */
> + free(conn->addr);
> + conn->addr = NULL;
> }
> }
>
> diff --git a/src/interfaces/libpq/libpq-int.h b/src/interfaces/libpq/libpq-int.h
> index 512762f999..76ee988038 100644
> --- a/src/interfaces/libpq/libpq-int.h
> +++ b/src/interfaces/libpq/libpq-int.h
> @@ -82,6 +82,8 @@ typedef struct
> #endif
> #endif /* USE_OPENSSL */
>
> +#include "common/pg_prng.h"
> +
> /*
> * POSTGRES backend dependent Constants.
> */
> @@ -373,6 +375,8 @@ struct pg_conn
> char *pgpassfile; /* path to a file containing password(s) */
> char *channel_binding; /* channel binding mode
> * (require,prefer,disable) */
> + char *loadbalance; /* load balance over hosts */
> + char *randomseed; /* seed for randomization of load balancing */
> char *keepalives; /* use TCP keepalives? */

A bit unclear why you put the variables at this point in the list, but
the list doesn't seem to be ordered strictly anyway; still, maybe they
would fit best at the bottom below target_session_attrs?

> char *keepalives_idle; /* time between TCP keepalives */
> char *keepalives_interval; /* time between TCP keepalive
> @@ -461,8 +465,10 @@ struct pg_conn
> PGTargetServerType target_server_type; /* desired session properties */
> bool try_next_addr; /* time to advance to next address/host? */
> bool try_next_host; /* time to advance to next connhost[]? */
> - struct addrinfo *addrlist; /* list of addresses for current connhost */
> - struct addrinfo *addr_cur; /* the one currently being tried */
> + int naddr; /* number of addrs returned by getaddrinfo */
> + int whichaddr; /* the addr currently being tried */

Address(es) is always spelt out in the comments, those two addr(s)
should also I think.

> diff --git a/src/interfaces/libpq/t/003_loadbalance.pl b/src/interfaces/libpq/t/003_loadbalance.pl
> new file mode 100644
> index 0000000000..07eddbe9cc
> --- /dev/null
> +++ b/src/interfaces/libpq/t/003_loadbalance.pl
> @@ -0,0 +1,167 @@
> +# Copyright (c) 2022, PostgreSQL Global Development Group

Copyright bump needed.

Cheers,

Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-01-06 17:32:52 Re: ATTACH PARTITION seems to ignore column generation status
Previous Message Tom Lane 2023-01-06 16:58:52 Re: postgres_fdw: using TABLESAMPLE to collect remote sample