Re: [EXTERNAL] Support load balancing in libpq

From: Jelte Fennema <postgres(at)jeltef(dot)nl>
To: Daniel Gustafsson <daniel(at)yesql(dot)se>
Cc: "Gregory Stark (as CFM)" <stark(dot)cfm(at)gmail(dot)com>, Andrey Borodin <amborodin86(at)gmail(dot)com>, Jacob Champion <jchampion(at)timescale(dot)com>, Maxim Orlov <orlovmg(at)gmail(dot)com>, Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>, Michael Banck <mbanck(at)gmx(dot)net>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: [EXTERNAL] Support load balancing in libpq
Date: 2023-03-17 08:50:41
Message-ID: CAGECzQTDXTJzwWJLcBksh=FLzE6xmxq2XyF_5QJp-TXVu=DOvA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> The documentation lists the modes disabled and random, but I wonder if it's
> worth expanding the docs to mention that "disabled" is pretty much a round
> robin load balancing scheme? It reads a bit odd to present load balancing
> without a mention of round robin balancing given how common it is.

I think you misunderstood what I meant in that section, so I rewrote
it to hopefully be clearer. Because disabled really isn't the same as
round-robin.

> This removes all uses of conn->addrlist_family and that struct member can be
> removed.

done

> s/stanby/standby/
> s/Postgres/<productname>PostgreSQL</productname>/

done

> The documentation typically use a less personal form, I would suggest something
> along the lines of:
>
> "If uniform load balancing is required then an external load balancing tool
> must be used. Non-uniform load balancing can also be used to skew the
> results, e.g. by providing the.."

rewrote this to stop using "you" and expanded a bit on the topic

> libpq_append_conn_error(conn, "unable to initiate random number generator");

done

> -#ifndef WIN32
> +/* MinGW has sys/time.h, but MSVC doesn't */
> +#ifndef _MSC_VER
> #include <sys/time.h>
> This seems unrelated to the patch in question, and should be a separate commit IMO.

It's not really unrelated. This only started to be needed because
libpq_prng_init calls gettimeofday . That did not work on MinGW
systems. Before this patch libpq was never calling gettimeofday. So I
think it makes sense to leave it in the commit.

> + LOAD_BALANCE_RANDOM, /* Read-write server */
> I assume this comment is a copy/paste and should have been reflecting random order?

Yes, done

> +++ b/src/interfaces/libpq/t/003_loadbalance_host_list.pl
> Nitpick, but we should probably rename this load_balance to match the parameter
> being tested.

Done

> A test
> which require root permission level manual system changes stand a very low
> chance of ever being executed, and as such will equate to dead code that may
> easily be broken or subtly broken.

While I definitely agree that it makes it hard to execute, I don't
think that means it will be executed nearly as few times as you
suggest. Maybe you missed it, but I modified the .cirrus.yml file to
configure the hosts file for both Linux and Windows runs. So, while I
agree it is unlikely to be executed manually by many people, it would
still be run on every commit fest entry (which should capture most
issues that I can imagine could occur).

> I am also not a fan of the random_seed parameter.

Fair enough. Removed

> A few ideas:
>
> * Add basic tests for the load_balance_host connection param only accepting
> sane values
>
> * Alter the connect_ok() tests in 003_loadbalance_host_list.pl to not require
> random_seed but instead using randomization. Thinking out loud, how about
> something along these lines?
> - Passing a list of unreachable hostnames with a single good hostname
> should result in a connection.
> - Passing a list of one good hostname should result in a connection
> - Passing a list on n good hostname (where all are equal) should result in
> a connection

Implemented all these.

> - Passing a list of n unreachable hostnames should result in log entries
> for n broken resolves in some order, and running that test n' times
> shouldn't - statistically - result in the same order for a large enough n'.

I didn't implement this one. Instead I went for another statistics
based approach with working hosts (see test for details).

> * Remove random_seed and 004_loadbalance_dns.pl

I moved 004_load_balance_dns.pl to a separate commit (after making
similar random_seed removal related changes to it). As explained above
I think it's worth it to have it because it gets executed in CI. But
feel free to commit only the main patch, if you disagree.

Attachment Content-Type Size
v13-0005-Remove-unnecessary-check-from-Fisher-Yates-imple.patch application/octet-stream 1.1 KB
v13-0002-Refactor-libpq-to-store-addrinfo-in-a-libpq-owne.patch application/octet-stream 11.6 KB
v13-0003-Support-load-balancing-in-libpq.patch application/octet-stream 17.5 KB
v13-0001-libpq-Run-pgindent-after-a9e9a9f32b3.patch application/octet-stream 22.3 KB
v13-0004-Add-DNS-based-libpq-load-balancing-test.patch application/octet-stream 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2023-03-17 08:51:56 Re: logical decoding and replication of sequences, take 2
Previous Message shiy.fnst@fujitsu.com 2023-03-17 08:45:29 RE: Dropped and generated columns might cause wrong data on subs when REPLICA IDENTITY FULL