Re: Support load balancing in libpq

From: Michael Banck <mbanck(at)gmx(dot)net>
To: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
Cc: Aleksander Alekseev <aleksander(at)timescale(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Support load balancing in libpq
Date: 2022-09-10 21:43:11
Message-ID: 631d04f0.7b0a0220.1140d.9a91@mx.google.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

the patch no longer applies cleanly, please rebase (it's trivial).

I don't like the provided commit message very much, I think the
discussion about pgJDBC having had load balancing for a while belongs
elsewhere.

On Wed, Jun 22, 2022 at 07:54:19AM +0000, Jelte Fennema wrote:
> I tried to stay in line with the naming of this same option in JDBC and
> Npgsql, where it's called "loadBalanceHosts" and "Load Balance Hosts"
> respectively. So, actually to be more in line it should be the option for
> libpq should be called "load_balance_hosts" (not "loadbalance" like
> in the previous patch). I attached a new patch with the name of the
> option changed to this.

Maybe my imagination is not so great, but what else than hosts could we
possibly load-balance? I don't mind calling it load_balance, but I also
don't feel very strongly one way or the other and this is clearly
bikeshed territory.

> I also don't think the name is misleading. Randomization of hosts will
> automatically result in balancing the load across multiple hosts. This is
> assuming more than a single connection is made using the connection
> string, either on the same client node or on different client nodes. I think
> I think is a fair assumption to make. Also note that this patch does not load
> balance queries, it load balances connections. This is because libpq works
> at the connection level, not query level, due to session level state.

I agree.

Also, I think the scope is ok for a first implementation (but see
below). You or others could possibly further enhance the algorithm in
the future, but it seems to be useful as-is.

> I agree it is indeed fairly simplistic load balancing.

If I understand correctly, you've added DNS-based load balancing on top
of just shuffling the provided hostnames. This makes sense if a
hostname is backed by more than one IP address in the context of load
balancing, but it also complicates the patch. So I'm wondering how much
shorter the patch would be if you leave that out for now?

On the other hand, I believe pgJDBC keeps track of which hosts are up or
down and only load balances among the ones which are up (maybe
rechecking after a timeout? I don't remember), is this something you're
doing, or did you consider it?

Some quick remarks on the patch:

/* OK, scan this addrlist for a working server address */
- conn->addr_cur = conn->addrlist;
reset_connection_state_machine = true;
conn->try_next_host = false;

The comment might need to be updated.

+ int naddr; /* # of addrs returned by getaddrinfo */

This is spelt "number of" in several other places in the file, and we
still have enough space to spell it out here as well without a
line-wrap.

Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2022-09-10 22:00:08 Re: Index ordering after IS NULL
Previous Message Jeff Janes 2022-09-10 21:28:10 Index ordering after IS NULL