| From: | Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> |
|---|---|
| To: | maxim(dot)boguk(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org |
| Subject: | Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. |
| Date: | 2026-06-22 20:22:53 |
| Message-ID: | CAK-MWwRVb7Lz14uJNeiggM8O15Y=QLRny9evxec2Pquu5+DwBg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
On Tue, Jun 2, 2026 at 9:51 PM Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com> wrote:
>
>
> On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <
> noreply(at)postgresql(dot)org> wrote:
>
>> The following bug has been logged on the website:
>>
>> Bug reference: 19505
>> Logged by: Maxim Boguk
>> Email address: maxim(dot)boguk(at)gmail(dot)com
>> PostgreSQL version: 18.4
>> Operating system: Ubuntu 24.04.4 LTS
>> Description:
>>
>> I started investigation of this issue after found that process count of
>> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
>> and real connections under 100 most time).
>> Somehow single (seems random by always heavy/analytical) query spawn
>> thousands of the threads and tens thousands of parallel workers.
>>
>> After some logging I caught one snapshot (ps -u postgres -L -o
>> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:
>>
>> [postgres(at)db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
>> 39257
>>
>> Main content is:
>> PID TID PPID StartTime
>> command
>> 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then:
>> The same PID but 1620 different TIDS.
>> PID TID PPID StartTime
>> command
>> #main process
>> 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> #1620 threads
>> 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>> 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> background_shared db [local] SELECT
>>
>> Then, 37571 rows!!! of:
>> PID TID PPID StartTime
>> command
>> 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>> 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
>> parallel
>> worker for PID 2158557
>>
>> I double checked the query (it had been logged in database log): it run
>> with
>> 6 worker processes and without any issues on manual run.
>>
>> Related db configuration:
>> max_connections = 1000
>> max_worker_processes = 128 # (change requires restart)
>> max_parallel_workers_per_gather = 16 # limited by max_parallel_workers
>> max_parallel_workers = 64
>> io_method = io_uring # worker, io_uring, sync
>> io_max_concurrency = -1 # Max number of IOs that one process
>> jit = on (usual suspect in case of weird things going on)
>>
>> Given that situation happens like 1-10 times per hour (and lead for short
>> LA
>> spikes up to 10000) - it's seriously affect the database replica
>> performance.
>>
>> No external/non-standard/C extensions except of pgq and postgis loaded
>> into
>> the database.
>>
>> I can look for any additional information and perform any local research
>> but currently I'm out of ideas what my next steps should be.
>>
>> PS: it's seems that the issue could be triggered by different queries, but
>> not the one particular
>
>
Update: issue had been triggered by unconstrained spawn of helper threads
for io_method=io_uring
(thousands/ten thousands of helper "iou-wrk-****" threads per bitmap scan).
Switching to the io_method=worker fixed problem.
Seems io_uring have some unexpected issues with unconstrained threads spawn.
--
Maxim Boguk
Senior Postgresql DBA
Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andres Freund | 2026-06-22 20:53:37 | Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons. |
| Previous Message | Tom Lane | 2026-06-22 18:30:54 | Re: BUG #19483: pg_upgrade fails with orphan records in pg_init_priv catalog table |