| From: | Adam Blomeke <adam(dot)blomeke(at)gmail(dot)com> |
|---|---|
| To: | pgpool-general(at)lists(dot)postgresql(dot)org |
| Subject: | Pgpool can't detect database status properly |
| Date: | 2025-12-12 21:25:59 |
| Message-ID: | CAG9Amsj61Oy7YGJw40uv7fedZwabLu2b5dVmJK4aDrZLDMVj+w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-general |
I'm resending this as it's been sitting in the moderation queue for a
while. Possibly because I didn't have a subject line? Anyways, any help
would be great. Thanks!
I’m setting up a pgpool cluster to replace a single node database in my
environment. The single node is separate from the cluster at the moment.
When it’s time to implement the DB I’m going to redo the backup/restore,
throw an upgrade from pg15->18, and then bring the cluster and take over
the old IP.
*Environment:*
- pgpool-II version: 4.6.3 (chirikoboshi)
- PostgreSQL version: 18
- OS: RHEL9
- Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197, 10.6.1.198)
+ 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
*Issue:*
I have pgpool configured and I’ve set it up using the scripts and config
files from a different instance, one which has been running just fine for a
year and a half or so. The issue I’m experiencing is that when I
detach/reattach a node, it sits in waiting constantly. It never transitions
to up. I have to manually change the status file to up for it to get to
agree that it is, and when I try to drop the node it doesn't actually drop
it. It just goes into waiting again. I also don’t see any connection
attempts from the pgpool server to the postgres nodes if I look at postgres
logs. I've confirmed that it can run the postgres commands from the command
line. I've tried this both running pgpool as a service and running it
directly from the command line. No difference in behavior.
Here’s the log output:
2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back.
reconnect host 10.6.1.200(5432) ===
2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169
2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down
(status: 2)
2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524
2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children
because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
are in streaming replication mode and not all backends were down
2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370
2025-12-03 14:20:49.038: main pid 1085028: LOG:
find_primary_node_repeatedly: waiting for finding a primary node
2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896
2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: primary
node is 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815
2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node: standby
node is 1
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821
2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new primary
node: 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660
2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main
node: 0
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667
2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done.
reconnect host 10.6.1.200(5432) ===
2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763
2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker process
received restart request
2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
pool_worker_child.c:182
2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request
received in pcp child process
2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173
2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits
with status 0 in failover()
2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850
2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid
1085089 in failover()
2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854
2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089
started
2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165
2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process started
2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
pgpool_main.c:905
2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp worker,
pid=1085093 socket=7
2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid:
1085093 exit with SUCCESS.
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with pid:
1085093 exits with status 0
2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398
2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback
event detected
2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself
2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524
2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524
2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524
2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback
event detected
2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself
2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524
2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback
event detected
2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself
2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524
2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback
event detected
......over and over and over again.
pcp_node_info output:
10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
2025-12-03 14:04:39
10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
2025-12-03 14:04:39
Logs show:
node status[0]: 1
node status[1]: 2
Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
(up).
*auto_failback behavior:*
- When a node is detached (pcp_detach_node), it goes to status 3 (down)
- auto_failback triggers and moves it to status 1 (waiting)
- Node never transitions from waiting to up
*Key configuration:*
backend_clustering_mode = 'streaming_replication'
backend_hostname0 = '10.6.1.199'
backend_hostname1 = '10.6.1.200'
backend_application_name0 = 'nasdw_users_1'
backend_application_name1 = 'nasdw_users_2'
use_watchdog = on
# 3 watchdog nodes configured
auto_failback = on
auto_failback_interval = 1
sr_check_period = 10
sr_check_user = 'pgpool'
sr_check_database = 'nasdw_users'
health_check_period = 1
health_check_user = 'pgpool'
health_check_database = 'nasdw_users'
failover_when_quorum_exists = on (default)
failover_require_consensus = on (default)
Cheers,
Adam
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Koshino Taiki | 2025-12-15 06:33:29 | Pgpool-II 4.6.5, 4.5.10, 4.4.15 and 4.3.18 are now officially released. |
| Previous Message | Koshino Taiki | 2025-12-09 04:02:22 | Pgpool-II 4.7 RC1 released. |