| From: | Adam Blomeke <adam(dot)blomeke(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | pgpool-general(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Pgpool can't detect database status properly |
| Date: | 2025-12-16 15:57:16 |
| Message-ID: | CAG9AmsigfRELb2B1K17vJ7YB_5xw0eVXY9AUmH3vj74RcLY08w@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgpool-general |
Thanks for the reply.
Your response makes sense as I'm still setting this cluster up, so it's
just me trying to connect to it.
I'm curious then what the right process is for when I need to pull a node
out of the cluster for maintenance (e.g. patching). I was under the
impression that I should drop the node, do a pg_rewind, manually set it as
a standby if it was the primary, and then add the node back in pgpool. I
guess I can't do that with auto failback turned on?
Cheers,
Adam
On Mon, Dec 15, 2025 at 6:43 PM Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
> > I'm resending this as it's been sitting in the moderation queue for a
> > while. Possibly because I didn't have a subject line? Anyways, any help
> > would be great. Thanks!
>
> I received your email this time.
>
> > I’m setting up a pgpool cluster to replace a single node database in my
> > environment. The single node is separate from the cluster at the moment.
> > When it’s time to implement the DB I’m going to redo the backup/restore,
> > throw an upgrade from pg15->18, and then bring the cluster and take over
> > the old IP.
> >
> >
> >
> > *Environment:*
> >
> > - pgpool-II version: 4.6.3 (chirikoboshi)
> > - PostgreSQL version: 18
> > - OS: RHEL9
> > - Cluster topology: 3 pgpool nodes (10.6.1.196, 10.6.1.197,
> 10.6.1.198)
> > + 2 PostgreSQL nodes (10.6.1.199 primary, 10.6.1.200 standby)
> >
> >
> >
> > *Issue:*
> >
> > I have pgpool configured and I’ve set it up using the scripts and config
> > files from a different instance, one which has been running just fine
> for a
> > year and a half or so. The issue I’m experiencing is that when I
> > detach/reattach a node, it sits in waiting constantly. It never
> transitions
> > to up.
>
> If you connect to 10.6.1.196 (or 10.6.1.197, 10.6.1.198) using psql
> and issue an SQL command, for example "SELECT 1", does it work? If it
> works, it means pgpool works fine.
>
> > I have to manually change the status file to up for it to get to
> > agree that it is,
>
> The pgpool status "waiting" means that the backend node has never
> revceived any query from pgpool clients yet. You can safely assume
> that that pgpool is up and running. Once pgpool receives queries, the
> status should be changed from "waiting" to "up".
>
> > and when I try to drop the node it doesn't actually drop
> > it. It just goes into waiting again.
>
> Sounds like an effect of auto fail back. because you set:
>
> auto_failback_interval = 1
>
> pgpool almost immediately brings the pgpool to online.
>
> > I also don’t see any connection
> > attempts from the pgpool server to the postgres nodes if I look at
> postgres
> > logs. I've confirmed that it can run the postgres commands from the
> command
> > line. I've tried this both running pgpool as a service and running it
> > directly from the command line. No difference in behavior.
>
> Probably there's something wrong in the configuration or trying to
> connect to wrong IP and/or port. Please turn on log_client_messages
> and log_per_node_statement, then send an SQL command to pgpool, and
> examin the pgpool log.
>
> > Here’s the log output:
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: === Starting fail back.
> > reconnect host 10.6.1.200(5432) ===
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:4169
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOG: Node 0 is not down
> > (status: 2)
> >
> > 2025-12-03 14:20:49.037: main pid 1085028: LOCATION: pgpool_main.c:1524
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG: Do not restart children
> > because we are failing back node id 1 host: 10.6.1.200 port: 5432 and we
> > are in streaming replication mode and not all backends were down
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:4370
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOG:
> > find_primary_node_repeatedly: waiting for finding a primary node
> >
> > 2025-12-03 14:20:49.038: main pid 1085028: LOCATION: pgpool_main.c:2896
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node:
> primary
> > node is 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2815
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: find_primary_node:
> standby
> > node is 1
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:2821
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new
> primary
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4660
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: failover: set new main
> > node: 0
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4667
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOG: === Failback done.
> > reconnect host 10.6.1.200(5432) ===
> >
> > 2025-12-03 14:20:49.189: main pid 1085028: LOCATION: pgpool_main.c:4763
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOG: worker
> process
> > received restart request
> >
> > 2025-12-03 14:20:49.189: sr_check_worker pid 1085088: LOCATION:
> > pool_worker_child.c:182
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOG: restart request
> > received in pcp child process
> >
> > 2025-12-03 14:20:50.189: pcp_main pid 1085087: LOCATION: pcp_child.c:173
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: PCP child 1085087 exits
> > with status 0 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4850
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOG: fork a new PCP child pid
> > 1085089 in failover()
> >
> > 2025-12-03 14:20:50.193: main pid 1085028: LOCATION: pgpool_main.c:4854
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOG: PCP process: 1085089
> > started
> >
> > 2025-12-03 14:20:50.193: pcp_main pid 1085089: LOCATION: pcp_child.c:165
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOG: process
> started
> >
> > 2025-12-03 14:20:50.194: sr_check_worker pid 1085090: LOCATION:
> > pgpool_main.c:905
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOG: forked new pcp
> worker,
> > pid=1085093 socket=7
> >
> > 2025-12-03 14:22:31.460: pcp_main pid 1085089: LOCATION: pcp_child.c:327
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with
> pid:
> > 1085093 exit with SUCCESS.
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:384
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOG: PCP process with
> pid:
> > 1085093 exits with status 0
> >
> > 2025-12-03 14:22:31.721: pcp_main pid 1085089: LOCATION: pcp_child.c:398
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOG: failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: DETAIL: restarting myself
> >
> > 2025-12-03 14:25:39.480: child pid 1085050: LOCATION: child.c:1524
> >
> > 2025-12-03 14:25:39.480: child pid 1085038: LOG: failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: DETAIL: restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085038: LOCATION: child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOG: failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: DETAIL: restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085035: LOCATION: child.c:1524
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOG: failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: DETAIL: restarting myself
> >
> > 2025-12-03 14:25:39.481: child pid 1085061: LOCATION: child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOG: failover or failback
> > event detected
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: DETAIL: restarting myself
> >
> > 2025-12-03 14:25:39.483: child pid 1085053: LOCATION: child.c:1524
> >
> > 2025-12-03 14:25:39.483: child pid 1085059: LOG: failover or failback
> > event detected
> >
> > ......over and over and over again.
> >
> >
> >
> >
> > pcp_node_info output:
> >
> > 10.6.1.199 5432 1 0.500000 waiting up primary primary 0 none none
> > 2025-12-03 14:04:39
> >
> > 10.6.1.200 5432 1 0.500000 waiting up standby standby 0 streaming async
> > 2025-12-03 14:04:39
> >
> > Logs show:
> >
> > node status[0]: 1
> >
> > node status[1]: 2
> >
> > Node 0 (primary) gets status 1 (waiting), node 1 (standby) gets status 2
> > (up).
>
> No, this does not show the backend status. Instead, it says
>
> > node status[0]: 1
>
> This means backend 0 is primary.
>
> > node status[1]: 2
>
> This means backend 1 is standby.
>
> > *auto_failback behavior:*
> >
> > - When a node is detached (pcp_detach_node), it goes to status 3
> (down)
> > - auto_failback triggers and moves it to status 1 (waiting)
> > - Node never transitions from waiting to up
>
> Sounds like pgpool has not received queries.
>
> > *Key configuration:*
> >
> > backend_clustering_mode = 'streaming_replication'
> >
> > backend_hostname0 = '10.6.1.199'
> >
> > backend_hostname1 = '10.6.1.200'
> >
> > backend_application_name0 = 'nasdw_users_1'
> >
> > backend_application_name1 = 'nasdw_users_2'
> >
> >
> >
> > use_watchdog = on
> >
> > # 3 watchdog nodes configured
> >
> >
> >
> > auto_failback = on
> >
> > auto_failback_interval = 1
> >
> >
> >
> > sr_check_period = 10
> >
> > sr_check_user = 'pgpool'
> >
> > sr_check_database = 'nasdw_users'
> >
> >
> >
> > health_check_period = 1
> >
> > health_check_user = 'pgpool'
> >
> > health_check_database = 'nasdw_users'
> >
> >
> >
> > failover_when_quorum_exists = on (default)
> >
> > failover_require_consensus = on (default)
> > Cheers,
> > Adam
>
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tatsuo Ishii | 2025-12-17 01:29:35 | Re: Pgpool can't detect database status properly |
| Previous Message | Tatsuo Ishii | 2025-12-15 23:43:25 | Re: Pgpool can't detect database status properly |