Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.

From: Maxim Boguk <maxim(dot)boguk(at)gmail(dot)com>
To: maxim(dot)boguk(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.
Date: 2026-06-02 18:51:43
Message-ID: CAK-MWwR6vm=0yg9wTovhsTUcSRnHJrBZ2X-d-KHBtyRM0j=qaw@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Tue, Jun 2, 2026 at 9:37 PM PG Bug reporting form <noreply(at)postgresql(dot)org>
wrote:

> The following bug has been logged on the website:
>
> Bug reference: 19505
> Logged by: Maxim Boguk
> Email address: maxim(dot)boguk(at)gmail(dot)com
> PostgreSQL version: 18.4
> Operating system: Ubuntu 24.04.4 LTS
> Description:
>
> I started investigation of this issue after found that process count of
> postgresql on my replica sometime jump to 200k+ (with max_connections=1000
> and real connections under 100 most time).
> Somehow single (seems random by always heavy/analytical) query spawn
> thousands of the threads and tens thousands of parallel workers.
>
> After some logging I caught one snapshot (ps -u postgres -L -o
> pid,tid,ppid,lstart,args -ww 2 ) with 39257 processes:
>
> [postgres(at)db ~/tmp]$ zcat ps-L-2026-06-02_17-40-22.gz | wc -l
> 39257
>
> Main content is:
> PID TID PPID StartTime
> command
> 2158552 2158552 948705 Tue Jun 2 17:40:17 2026 postgres: 18/main:
> background_shared db [local] SELECT
>
> Then:
> The same PID but 1620 different TIDS.
> PID TID PPID StartTime
> command
> #main process
> 2158557 2158557 948705 Tue Jun 2 17:40:18 2026 postgres: 18/main:
> background_shared db [local] SELECT
> #1620 threads
> 2158557 2158607 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
> 2158557 2158608 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
> 2158557 2158609 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> background_shared db [local] SELECT
>
> Then, 37571 rows!!! of:
> PID TID PPID StartTime
> command
> 2158579 2159176 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159179 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159183 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159196 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159198 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
> 2158579 2159202 948705 Tue Jun 2 17:40:20 2026 postgres: 18/main:
> parallel
> worker for PID 2158557
>
> I double checked the query (it had been logged in database log): it run
> with
> 6 worker processes and without any issues on manual run.
>
> Related db configuration:
> max_connections = 1000
> max_worker_processes = 128 # (change requires restart)
> max_parallel_workers_per_gather = 16 # limited by max_parallel_workers
> max_parallel_workers = 64
> io_method = io_uring # worker, io_uring, sync
> io_max_concurrency = -1 # Max number of IOs that one process
> jit = on (usual suspect in case of weird things going on)
>
> Given that situation happens like 1-10 times per hour (and lead for short
> LA
> spikes up to 10000) - it's seriously affect the database replica
> performance.
>
> No external/non-standard/C extensions except of pgq and postgis loaded into
> the database.
>
> I can look for any additional information and perform any local research
> but currently I'm out of ideas what my next steps should be.
>
> PS: it's seems that the issue could be triggered by different queries, but
> not the one particular.
>
>
UPD:

Looks like the issue is always triggered by queries of the same kind (below
query and explain),
but I can't see any way how this query can spawn 10k-100k threads (or spawn
any threads at all).
PS: The problem started after upgrading 17->18, so the next suspect I am
going to test is "io_method = io_uring".

explain analyze WITH _tmp as (
SELECT
"job_stats_master"."device", "job_stats_master"."organic",
"job_stats_master"."paid", "job_stats_master"."global_action",
"job_stats_master"."employer_id",
SUM(COALESCE(stats_count, 1)), sum(coalesce(price, 0)) AS spent,
DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE
'Etc/UTC'),
coalesce(SUM(price) filter(where bid_metric_matches(bid_metric,
global_action) AND paid AND NOT organic), 0) earned,
"job_stats_master"."job_board_id", "job_stats_master"."job_group_id",
"job_stats_master"."country_id", "job_stats_master"."gcc",
"job_stats_master"."exchange_id", "job_stats_master"."bid_metric",
sum(coalesce(price_publisher, 0)), sum(coalesce(cpc, 0)),
sum(coalesce(price, 0) * coalesce(ncc, 1)) AS spent_ncc,
sum(coalesce(price, 0) / coalesce(gcc, 1)) AS spent_gcc,
sum(coalesce(price_publisher, 0)) AS publisher_spent
FROM "job_stats_master"
WHERE
"job_stats_master"."job_board_id" = 30875
AND (job_stats_master.created_at >= '2026-05-01 00:00:00')
AND (job_stats_master.created_at <= '2026-05-31 23:59:59.999999')
AND ((job_stats_master.unpaid_reason NOT IN (7, 9, 19) OR
job_stats_master.unpaid_reason IS NULL))
AND "job_stats_master"."employer_id" NOT IN (20894, 28518, 28508, 28517,
20691, 24731, 28280, 20692, 20690, 28281, 28509, 20700, 28284, 28520,
20895, 24730, 20698, 20699, 28285, 28519, 28521, 28613, 28607, 28612,
28602, 28603, 28604, 28610, 28605, 28606, 28608, 28609, 28614, 28615,
28611, 28629, 28630, 28625, 28626, 28627, 28633, 28632, 28622, 28624,
28620, 28621, 28623, 28628, 28631, 28652, 28653, 28654, 28658, 28656,
28647, 28648, 28651, 28649, 28655, 28650, 28659, 28657, 28660, 20893,
28279, 19292, 28527, 19291, 28278, 24707, 19290, 28526, 28525, 18157,
18158, 18156, 18155, 28255, 28256, 28258, 28259, 28260, 24689, 28262,
28265, 28268, 28267, 24688, 28257, 22774, 20701, 28283, 21597, 24729,
20702, 22781, 20703, 20896, 28282, 28524, 22780, 28522, 28523, 7400, 7397,
7399, 8118, 8110, 7402, 6706, 6700, 6703, 8113, 8108, 9457, 8109, 9453,
7068, 8114, 7401, 9456, 6866, 8117, 9455, 8115, 7398, 6707, 7403, 9454,
7404, 8116, 6708, 21129, 21130, 21131, 21132, 24369, 24364, 24368, 24365,
24367, 24366, 18576, 18574, 18575, 18573, 24433, 21414, 2887, 60, 62, 61,
10381, 20676, 20677, 20851, 20972, 18220, 18219, 18221, 18222, 3182, 3158,
3157, 5213, 5212, 3156, 3839, 3840, 6919, 5818, 6951, 9502, 3841, 10489,
10490, 10491, 15231, 10493, 15232, 10492, 9189, 9185, 9187, 9188, 9190,
9186, 8898, 8900, 8897, 8899, 8895, 8896, 8912, 8909, 8914, 8913, 8911,
8910, 18184, 18182, 18185, 18183, 10858, 20706, 10861, 10859, 10857, 10860,
19570, 8906, 8908, 8907, 8904, 8905, 8903, 7686, 7685, 7684, 7682, 7687,
7683, 7689, 7692, 7688, 7693, 7690, 7691, 9503, 8225, 11593, 8223, 8227,
8222, 8226, 8224, 3295, 3294, 4151, 10468, 3293, 4081, 5668, 3153, 3152,
3154, 3480, 3148, 3150, 3482, 3149, 3291, 3292, 13708, 3987, 3290, 3533,
20675, 10298, 10290, 10297, 10299, 10296, 3286, 3284, 5320, 3951, 3907,
3459, 3285, 3529, 3289, 3288, 3287, 6894, 19552, 7873, 6240, 21467, 6816,
17328, 4598, 4599, 4600, 18244)
GROUP BY "job_stats_master"."device", "job_stats_master"."organic",
"job_stats_master"."paid", "job_stats_master"."global_action",
"job_stats_master"."employer_id", "job_stats_master"."job_board_id",
"job_stats_master"."job_group_id", "job_stats_master"."country_id",
DATE(job_stats_master.created_at AT TIME ZONE 'UTC' AT TIME ZONE
'Etc/UTC'), "job_stats_master"."gcc", "job_stats_master"."exchange_id",
"job_stats_master"."bid_metric"
)
SELECT employers.enterprise_id AS tier0_enterprise_id, _tmp.*
FROM _tmp
INNER JOIN employers on employers.id = _tmp.employer_id
;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Hash Join (cost=117116.37..156282.19 rows=556873 width=280) (actual
time=626.650..1874.724 rows=313222.00 loops=1)
Hash Cond: (job_stats_master.employer_id = employers.id)
Buffers: shared hit=505859
-> Finalize GroupAggregate (cost=115548.94..154129.81 rows=556873
width=276) (actual time=593.246..1799.007 rows=313222.00 loops=1)
Group Key: job_stats_master.device, job_stats_master.organic,
job_stats_master.paid, job_stats_master.global_action,
job_stats_master.employer_id, job_stats_master.job_group_id,
job_stats_master.country_id, (date(((job_stats_master.created_at AT TIME
ZONE 'UTC'::text) AT TIME ZONE 'Etc/UTC'::text))), job_stats_master.gcc,
job_stats_master.exchange_id, job_stats_master.bid_metric
Buffers: shared hit=495471
-> Gather Merge (cost=115548.94..130184.30 rows=556872
width=244) (actual time=593.228..1242.326 rows=475158.00 loops=1)
Workers Planned: 6
Workers Launched: 6
Buffers: shared hit=495471
-> Partial GroupAggregate (cost=105548.90..110653.56
rows=92812 width=244) (actual time=569.484..682.066 rows=67879.71 loops=7)
Group Key: job_stats_master.device,
job_stats_master.organic, job_stats_master.paid,
job_stats_master.global_action, job_stats_master.employer_id,
job_stats_master.job_group_id, job_stats_master.country_id,
(date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE
'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id,
job_stats_master.bid_metric
Buffers: shared hit=495471
-> Sort (cost=105548.90..105641.71 rows=92812
width=72) (actual time=569.430..580.918 rows=87120.71 loops=7)
Sort Key: job_stats_master.device,
job_stats_master.organic, job_stats_master.paid,
job_stats_master.global_action, job_stats_master.employer_id,
job_stats_master.job_group_id, job_stats_master.country_id,
(date(((job_stats_master.created_at AT TIME ZONE 'UTC'::text) AT TIME ZONE
'Etc/UTC'::text))), job_stats_master.gcc, job_stats_master.exchange_id,
job_stats_master.bid_metric
Sort Method: quicksort Memory: 10559kB
Buffers: shared hit=495471
Worker 0: Sort Method: quicksort Memory: 9807kB
Worker 1: Sort Method: quicksort Memory: 9684kB
Worker 2: Sort Method: quicksort Memory: 9818kB
Worker 3: Sort Method: quicksort Memory: 9772kB
Worker 4: Sort Method: quicksort Memory: 9764kB
Worker 5: Sort Method: quicksort Memory: 9702kB
-> Parallel Bitmap Heap Scan on
job_stats_new_2026_05 job_stats_master (cost=34412.46..102021.67
rows=92812 width=72) (actual time=174.110..463.709 rows=87120.71 loops=7)
Recheck Cond: ((job_board_id = 30875) AND
(created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND
(created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone))
Filter: (((unpaid_reason <> ALL
('{7,9,19}'::integer[])) OR (unpaid_reason IS NULL)) AND (employer_id <>
ALL
('{20894,28518,28508,28517,20691,24731,28280,20692,20690,28281,28509,20700,28284,28520,20895,24730,20698,20699,28285,28519,28521,28613,28607,28612,28602,28603,28604,28610,28605,28606,28608,28609,28614,28615,28611,28629,28630,28625,28626,28627,28633,28632,28622,28624,28620,28621,28623,28628,28631,28652,28653,28654,28658,28656,28647,28648,28651,28649,28655,28650,28659,28657,28660,20893,28279,19292,28527,19291,28278,24707,19290,28526,28525,18157,18158,18156,18155,28255,28256,28258,28259,28260,24689,28262,28265,28268,28267,24688,28257,22774,20701,28283,21597,24729,20702,22781,20703,20896,28282,28524,22780,28522,28523,7400,7397,7399,8118,8110,7402,6706,6700,6703,8113,8108,9457,8109,9453,7068,8114,7401,9456,6866,8117,9455,8115,7398,6707,7403,9454,7404,8116,6708,21129,21130,21131,21132,24369,24364,24368,24365,24367,24366,18576,18574,18575,18573,24433,21414,2887,60,62,61,10381,20676,20677,20851,20972,18220,18219,18221,18222,3182,3158,3157,5213,5212,3156,3839,3840,6919,5818,6951,9502,3841,10489,10490,10491,15231,10493,15232,10492,9189,9185,9187,9188,9190,9186,8898,8900,8897,8899,8895,8896,8912,8909,8914,8913,8911,8910,18184,18182,18185,18183,10858,20706,10861,10859,10857,10860,19570,8906,8908,8907,8904,8905,8903,7686,7685,7684,7682,7687,7683,7689,7692,7688,7693,7690,7691,9503,8225,11593,8223,8227,8222,8226,8224,3295,3294,4151,10468,3293,4081,5668,3153,3152,3154,3480,3148,3150,3482,3149,3291,3292,13708,3987,3290,3533,20675,10298,10290,10297,10299,10296,3286,3284,5320,3951,3907,3459,3285,3529,3289,3288,3287,6894,19552,7873,6240,21467,6816,17328,4598,4599,4600,18244}'::integer[])))
Rows Removed by Filter: 45
Heap Blocks: exact=76322
Buffers: shared hit=495195
Worker 0: Heap Blocks: exact=69229
Worker 1: Heap Blocks: exact=67759
Worker 2: Heap Blocks: exact=69407
Worker 3: Heap Blocks: exact=68584
Worker 4: Heap Blocks: exact=68597
Worker 5: Heap Blocks: exact=67828
-> Bitmap Index Scan on
job_stats_new_2026_05_job_board_id_job_reference_created_at_idx
(cost=0.00..34356.49 rows=635582 width=0) (actual time=90.252..90.253
rows=610218.00 loops=1)
Index Cond: ((job_board_id = 30875)
AND (created_at >= '2026-05-01 00:00:00'::timestamp without time zone) AND
(created_at <= '2026-05-31 23:59:59.999999'::timestamp without time zone))
Index Searches: 1
Buffers: shared hit=7343
-> Hash (cost=1290.53..1290.53 rows=25173 width=8) (actual
time=33.285..33.292 rows=25173.00 loops=1)
Buckets: 32768 Batches: 1 Memory Usage: 1240kB
Buffers: shared hit=10388
-> Seq Scan on employers (cost=0.00..1290.53 rows=25173 width=8)
(actual time=18.012..30.607 rows=25173.00 loops=1)
Buffers: shared hit=10388
Planning:
Buffers: shared hit=1212
Planning Time: 2.535 ms
JIT:
Functions: 88
Options: Inlining false, Optimization false, Expressions true, Deforming
true
Timing: Generation 11.607 ms (Deform 5.619 ms), Inlining 0.000 ms,
Optimization 7.214 ms, Emission 90.846 ms, Total 109.668 ms
Execution Time: 1902.285 ms

--
Maxim Boguk
Senior Postgresql DBA

Phone UA: +380 99 143 0000
Phone AU: +61 45 218 5678

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2026-06-02 19:21:45 Re: GREATEST/LEAST ignores comparison operator volatility in contain_volatile_functions_walker
Previous Message PG Bug reporting form 2026-06-02 18:36:47 BUG #19505: Some weird spikes postgresql processes in database (up to 200k sometime) without apparent reasons.