Quick Links

Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?

From:	TV <tvfan2014(at)gmail(dot)com>
To:	pgpool-general(at)lists(dot)postgresql(dot)org
Subject:	Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
Date:	2025-07-16 12:13:49
Message-ID:	CAFjdVW8-2kJLr+bRzGXqs3Xg6cgq-hwDg9N5rg8cgnJZ6=WcOA@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgpool-general

Just to give a bit of background, we've recently migrated from old setup to
new physical servers, and are running Ubuntu24 and latest (4.6.2) version
of pgpool. The migration went fairly well, but we are noticing that the
performance isn't any better than on the old servers, frankly it seems...
worse. I was wondering if some of the pgpool pros could look over our
config and perhaps recommend some changes/tuning? Hardware-wise, it's
pretty beefy, we got 1TB of RAM to play with, 80 cores (2 processors with
20 physical cores and 40 virtual), hardware definitely doesn't seem to be a
problem. Some 'highlights' from pgpool.conf, feel free to ask for other
settings if they'll help to clear up the picture:

num_init_children = 3500
max_pool = 1
child_life_time = 0
child_max_connections = 0
connection_life_time = 500
client_idle_limit = 600
process_management_mode = dynamic
process_management_strategy = gentle
min_spare_children = 50
max_spare_children = 100
connection_cache = on
load_balance_mode = on
disable_load_balance_on_write = 'transaction'
statement_level_load_balance = on

This is a 4 node cluster running PG13 and backend_weight is set to 1 for
all 4 nodes.

Some of the errors we are seeing in pgpool logs:
2025-07-15 10:57:32: pid 2629089: CONTEXT: while checking replication time
lag
2025-07-15 10:57:32: pid 2629089: LOCATION: pool_worker_child.c:644
2025-07-15 10:57:33: pid 3892376: LOG: Error message from backend: DB node
id: 2 message: "canceling statement due to conflict with recovery"
2025-07-15 10:57:33: pid 3892376: LOCATION: pool_proto_modules.c:3226
2025-07-15 10:57:33: pid 3892376: FATAL: unable to read data from DB node 2
2025-07-15 10:57:33: pid 3892376: DETAIL: EOF encountered with backend
2025-07-15 10:57:33: pid 3892376: LOCATION: pool_stream.c:274
2025-07-15 10:57:33: pid 2629004: LOG: child process with pid: 3892376
exited with success and will not be restarted
2025-07-15 10:57:33: pid 2629004: LOCATION: pgpool_main.c:2059

Also this:
2025-07-15 11:02:22: pid 3892505: ERROR: unable to read data from DB node 2
2025-07-15 11:02:22: pid 3892505: DETAIL: do not failover because
failover_on_backend_error is off
2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:407
2025-07-15 11:02:22: pid 3892505: WARNING: write on backend 2 failed with
error :"Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETAIL: while trying to write data from
offset: 0 wlen: 17
2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:714
2025-07-15 11:02:22: pid 3892505: WARNING: write on backend 2 failed with
error :"Broken pipe"
2025-07-15 11:02:22: pid 3892505: DETAIL: while trying to write data from
offset: 0 wlen: 5
2025-07-15 11:02:22: pid 3892505: LOCATION: pool_stream.c:714

saw this is as well:
2025-07-15 11:05:12: pid 2629089: CONTEXT: while checking replication time
lag
2025-07-15 11:05:12: pid 2629089: LOCATION: pool_worker_child.c:644
2025-07-15 11:05:19: pid 3891928: ERROR: unable to read data from frontend
2025-07-15 11:05:19: pid 3891928: DETAIL: socket read function returned -1
2025-07-15 11:05:19: pid 3891928: LOCATION: pool_stream.c:414
2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice
message from backend: DB node id: 1 backend pid: 3938180 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice
message from backend: DB node id: 2 backend pid: 3929256 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice
message from backend: DB node id: 3 backend pid: 3929098 statement: "ABORT"
message:
"terminating connection due to conflict with recovery"
2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: LOG: pool_send_and_wait: Error or notice
message from backend: DB node id: 0 backend pid: 3060000 statement: "ABORT"
message:
"terminating connection due to idle-in-transaction timeout"
2025-07-15 11:05:19: pid 3891928: LOCATION: pool_proto_modules.c:3955
2025-07-15 11:05:19: pid 3891928: WARNING: write on backend 1 failed with
error :"Broken pipe"
2025-07-15 11:05:19: pid 3891928: DETAIL: while trying to write data from
offset: 0 wlen: 5

Some of these generally seem to suggest connectivity problems? Anything
you can suggest to look into? It's also worth noting that if we bypass
the pgpool VIP and connect the applications directly to the DB master node,
there are no problems reported so it sure does seem like something with our
pgpool setup...

Any help will be much recommended.

Responses

Re: Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions? at 2025-07-16 12:32:03 from Achilleas Mantzios

Browse pgpool-general by date

	From	Date	Subject
Next Message	Achilleas Mantzios	2025-07-16 12:32:03	Re: Poor load balancing performance in PGPool 4.6 on PG13, any config suggestions?
Previous Message	Emond Papegaaij	2025-07-10 06:34:53	Re: FATAL: simple query "BEGIN" arrived before ending an extended query message