From: | Murthy Nunna <mnunna(at)fnal(dot)gov> |
---|---|
To: | "pgsql-admin(at)lists(dot)postgresql(dot)org" <pgsql-admin(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Query Spins |
Date: | 2025-09-05 01:47:15 |
Message-ID: | DM8PR09MB6677162796A489508243437EB803A@DM8PR09MB6677.namprd09.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-admin |
More information....
1) At 5:30pm, the query is fired 5 times from a client
2) 4 of them finished. One stuck (spins - takes CPU)
3) The underlying tables are pretty big and dynamic. But I have been running autovacuum and analyze aggressively at table level.
4) max_parallel_workers and max_worker_processes are both set to 8.
I collected following info just a little after 5:30pm after all 5 queries have fired.
ps -aef | grep postgres | grep worker
postgres 2162504 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162501
postgres 2162505 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162501
postgres 2162506 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162502
postgres 2162507 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162502
postgres 2162508 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162503
postgres 2162509 1802555 99 17:30 ? 00:00:04 postgres: parallel worker for PID 2162503
postgres 2162827 1802555 99 17:30 ? 00:00:03 postgres: parallel worker for PID 2162703
It is almost certain that max_parallel_workers have exhausted. So, what? The remaining query should not get stuck!
The stuck query leader PID is 2162502. The parallel workers of this PID 2162502 are long gone.
ps -aef | grep 2162506 | grep -v grep
ps -aef | grep 2162507 | grep -v grep
Following is from pg_stat_activity:
pid | 2162502
client_port | 37264
xact_start | 2025-09-04 17:30
query_start | 2025-09-04 17:30
state_change | 2025-09-04 17:30
wait_event |
state | active
now | 2025-09-04 20:10
time_runnning | 02:40:00.509246
Interesting part is, temp file... last change timestamp changes but file size doesn't. File pgsql_tmp2162502.1 has been at 242155520 bytes for a long time but the last change time stamp keeps changing.
ls -ltr base/pgsql_tmp/
total 277864
-rw-------. 1 postgres postgres 16359424 Sep 4 17:30 pgsql_tmp2162502.0
-rw-------. 1 postgres postgres 242155520 Sep 4 20:21 pgsql_tmp2162502.1
-----Original Message-----
From: Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at>
Sent: Monday, September 1, 2025 2:52 AM
To: Murthy Nunna <mnunna(at)fnal(dot)gov>; pgsql-admin(at)lists(dot)postgresql(dot)org
Subject: Re: Query Spins
[EXTERNAL] – This message is from an external sender
On Mon, 2025-09-01 at 03:52 +0000, Murthy Nunna wrote:
> Pg16.10
>
> I have a query which runs fine most of the time. When it runs fine, it
> spawns parallel workers. In pg_stat_activity, wait_event is blank,
> state is active and backend_type = "client backend" for the main
> query. For parallel workers of this query I see wait_event =
> MessageQueueSend, state is active and backend_type = "parallel worker"
>
> But some times, it has no parallel workers. wait_event is blank, state
> is active and backend_type =client backend. And it never ends. It takes up lot of CPU.
> The socket on both server and client server are in ESTABLISHED state
> (netstat -tulpa | grep <client_port>).
Perhaps no workers are spawned because "max_parallel_workers" has already been exhausted by other backends. Check for the number of concurrent parallel workers next time you get the error.
I cannot know the source of your performance problems, but perhaps it is a combination of system overload and lack of available parallel worker processes, which might well go together.
Yours,
Laurenz Albe
From | Date | Subject | |
---|---|---|---|
Next Message | Ghiurea, Isabella | 2025-09-05 18:26:30 | pgbackrest with no PITR option configuraton |
Previous Message | Laurenz Albe | 2025-09-04 15:15:30 | Re: Escaping special characters - \r when doing COPY CSV |