Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: maxim(dot)boguk(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
Date: 2026-06-22 20:22:53
Message-ID: CAK-MWwRVb7Lz14uJNeiggM8O15Y=QLRny9evxec2Pquu5+DwBg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jun 2, 2026 at 9:51 PM Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:

>
>
> On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 19505
>> Logged by: Maxim Boguk
>> Email address: maxim(dot)boguk(at)gmail(dot)com
>> PostgreSQL version: 18.4
>> Operating system: Ubuntu 24.04.4 LTS
>> Description:
>>
>> I started investigation of this issue after found that process count of
>> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
>> and real connections under 100 most time).
>> Somehow single (seems random by always heavy/analytical) query spawn
>> thousands of the threads and tens thousands of parallel workers.
>>
>> After some logging I caught one snapshot (ps -u postgres -L -o
>> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:
>>
>> [postgres(at)db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
>> 39257
>>
>> Main content is:
>> PID TID PPID StartTime
>> command
>> 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then:
>> The same PID but 1620 different TIDS.
>> PID TID PPID StartTime
>> command
>> #main process
>> 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> #1620 threads
>> 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then, 37571 rows!!! of:
>> PID TID PPID StartTime
>> command
>> 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>>
>> I double checked the query (it had been logged in database log): it run
>> with
>> 6 worker processes and without any issues on manual run.
>>
>> Related db configuration:
>> max_connections = 1000
>> max_worker_processes = 128 # (change requires restart)
>> max_parallel_workers_per_gather = 16 # limited by max_parallel_workers
>> max_parallel_workers = 64
>> io_method = io_uring # worker, io_uring, sync
>> io_max_concurrency = -1 # Max number of IOs that one process
>> jit = on (usual suspect in case of weird things going on)
>>
>> Given that situation happens like 1-10 times per hour (and lead for short
>> LA
>> spikes up to 10000) - it's seriously affect the database replica
>> performance.
>>
>> No external/non-standard/C extensions except of pgq and postgis loaded
>> into
>> the database.
>>
>> I can look for any additional information and perform any local research
>> but currently I'm out of ideas what my next steps should be.
>>
>> PS: it's seems that the issue could be triggered by different queries, but
>> not the one particular
>
>

Update: issue had been triggered by unconstrained spawn of helper threads
for io_method=io_uring
(thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan).
Switching to the io_method=worker fixed problem.

Seems io_uring have some unexpected issues with unconstrained threads spawn.

--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2026-06-22 20:53:37 Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
Previous Message Tom Lane 2026-06-22 18:30:54 Re: BUG #19483: pg_upgrade fails with orphan records in pg_init_priv catalog table