| From: | Attila Soki <atiware(at)gmx(dot)net> |
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com>, Laurenz Albe <laurenz(dot)albe(at)cybertec(dot)at> |
| Cc: | pgsql-performance(at)postgresql(dot)org |
| Subject: | Re: unstable query plan on pg 16,17,18 |
| Date: | 2026-02-27 15:00:25 |
| Message-ID: | 24F30E2F-038C-4E3D-A8AA-1C8EAAF2E547@gmx.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
On 27 Feb 2026, at 09:15, Andrei Lepikhov <lepihov(at)gmail(dot)com> wrote:
>
> -> Hash Right Join (cost=210369.25..210370.30 rows=8 width=99)
> (actual time=150.790..150.853 rows=44.56 loops=21798)
>
> Schema of this part of the query tree is as the following:
>
> Hash Right Join (loops=21798)
> │
> ├─ [Left/Probe] GroupAggregate (loops=14426)
> │ └─ Merge Right Anti Join
> │ └─ Merge Join
> │ └─ Index Only Scan on table_k gkal_2 (loops=14426)
> │
> └─ [Right/Build = Hash] Nested Loop (loops=21798)
> ├─ Index Scan on table_o goftr_1 (loops=21798)
> │ Index Cond: goftr_1.au_id = gauf_1.id <http://gauf_1.id/>
> └─ Index Scan on table_k gkal_1
> Index Cond: gkal_1.oo_id = goftr_1.id <http://goftr_1.id/>
>
> So, the hash table is rebuilt each rescan based on the changed 'gauf_1.id <http://gauf_1.id/>' external parameter.
> Without the query, it is hard to say exactly what the trigger of this problem is. Having a reproduction, we could use planner advising extensions and see how additional knowledge of true cardinalities rebuilds the query plan. Sometimes, additional LATERAL restriction, added by the planner to pull-up subplan, restricts the join search scope badly, but I doubt if we have this type of problem here.
I searched for the condition kal.dp_end_dat < current_date, then realized that this part of the explain is misleading.
Index Scan using table_k_late_spec_dp_end_dat_key on schema1.table_k kal (cost=0.28..122468.46 rows=196053 width=24) (actual time=0.039..0.614 rows=471.00 loops=1)
Output: kal.dp_rti_id, kal.art_dp_res, kal.oo_id
Index Cond: (kal.dp_end_dat < ('now'::cstring)::date)
Index Searches: 1
Buffers: shared hit=230 read=49
I/O Timings: shared read=0.142
The definiton of the index table_k_late_spec_dp_end_dat_key is:
CREATE INDEX table_k_late_spec_dp_end_dat_key
ON schema1.table_k
USING btree
(dp_end_dat)
WHERE dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));
This, because the where in index corresponds the where in query. so the simplified query is:
SELECT * FROM schema1.table_k AS kal
WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));
The surrounding query part of the view is below, where the part with "dp_end_dat < current_date" is in the "with late as ()":
WITH late AS (
SELECT kal.dp_rti_id AS rti_id,
sum(COALESCE(kal.art_dp_res, 0.0000)) AS sum_art_dp_late
FROM schema1.table_k kal
WHERE kal.dp_status IS NOT NULL AND kal.dp_status > 0 AND COALESCE(kal.art_rtd, 0.0000) < (COALESCE(kal.art_grt, 0.0000) + COALESCE(kal.art_grt_j2j, 0.0000)) AND kal.dp_st_dat IS NOT NULL AND kal.dp_end_dat IS NOT NULL AND kal.dp_end_dat < 'now'::text::date AND kal.oo_id IS NOT NULL
AND NOT (EXISTS (
SELECT akdt_late.oo_id
FROM schema1.table_k_dtg akdt_late — ------ this is a view
WHERE akdt_late.dp_rti_id::text = kal.dp_rti_id::text AND akdt_late.oo_id IS NOT NULL
AND (akdt_late.art_prov_res > 0.0000 OR akdt_late.dp_status > 0 AND akdt_late.art_dp_res > 0.0000)
AND akdt_late.datum >= 'now'::text::date
AND (akdt_late.a_status::text = ANY (ARRAY['d'::character varying::text, 'v'::character varying::text, 'i'::character varying::text]))
AND akdt_late.ih_flag AND kal.oo_id::text = akdt_late.oo_id::text
))
GROUP BY kal.dp_rti_id
)
SELECT akd.oo_id,
akd.dp_rti_id AS rti_id,
akd.datum,
akd.lgaagng AS auf_lgaagng,
akd.rueday_def,
akd.rettag_def,
COALESCE(min(COALESCE(sum_ast_per_day.sum_per_day, 0.0000)), 0.0000) AS sum_ast_per_day,
COALESCE(max(COALESCE(sum_red_per_day.sum_per_day, 0.0000)), 0.0000) AS sum_red_per_day,
CASE
WHEN akd.datum > 'now'::text::date THEN COALESCE(late.sum_art_dp_late, 0.0000)
ELSE 0.0000
END AS sum_art_dp_late
FROM schema1.table_k_future_dt akd — ------ this is a view
LEFT JOIN schema1.dd_ext ext_dd ON ext_dd.id::text = akd.ext::text
LEFT JOIN schema1.dp_epkt ext_dd_dpe ON ext_dd_dpe.id::text = ext_dd.table_d_id::text
LEFT JOIN late ON late.rti_id::text = akd.dp_rti_id::text
LEFT JOIN LATERAL (
SELECT COALESCE(sum(COALESCE(stk.anz, 0.0000)), 0.0000) AS sum_per_day
FROM schema1.al_ast stk
WHERE stk.rti_id::text = akd.dp_rti_id::text AND stk.von <= akd.datum AND stk.bis >= akd.datum
GROUP BY akd.datum
) sum_ast_per_day ON (
EXISTS (
SELECT al_ast.rti_id
FROM schema1.al_ast
WHERE al_ast.rti_id::text = akd.dp_rti_id::text
)
)
LEFT JOIN LATERAL (
SELECT COALESCE(sum(COALESCE(alred.anz, 0.0000)), 0.0000) AS sum_per_day
FROM schema1.al_red alred
WHERE alred.rti_id::text = akd.dp_rti_id::text AND alred.von <= akd.datum AND alred.bis >= akd.datum
GROUP BY akd.datum
) sum_red_per_day ON (
EXISTS (
SELECT al_red.rti_id
FROM schema1.al_red
WHERE al_red.rti_id::text = akd.dp_rti_id::text
)
)
WHERE (ext_dd.table_d_id IS NULL OR ext_dd.table_d_id::text = 'schema1'::text OR NOT COALESCE(ext_dd_dpe.enabled, false))
AND akd.rti_id::text !~~ 'P%'::text AND akd.dp_rti_id::text !~~ 'P%'::text
AND (akd.art_dp_res > 0.0000 OR akd.art_prov_res > 0.0000 OR akd.art_dp_zm > 0.0000)
AND (akd.lgaagng IS NULL OR akd.lgaagng::date >= 'now'::text::date AND akd.lgaagng::date <= ('now'::text::date + '3 mons'::interval)::date)
GROUP BY akd.oo_id, akd.dp_rti_id, akd.datum, akd.lgaagng, akd.rueday_def, akd.rettag_def, late.sum_art_dp_late
"gauf" is in one of the table_k_* views, and looks like below. There are multiple variants, they differ mostly in "where" part.
SELECT gdt.datum,
gkal.rti_id,
gdt.au_id,
gkal.oo_id,
gkal.id AS kal_id,
gauf.status AS a_status,
goftr_1.token AS ih_flag,
gdt.prov,
gdt.def,
gkal.dp_status,
gkal.ext,
gdt.rueday_def,
gdt.rettag_def,
gdt.rueday_prov,
gdt.rettag_prov,
gauf.lgaagng,
gauf.lgaein,
gkal.art_dp_res,
gkal.art_prov_res,
gkal.art_dp_zm,
gkal.rti,
gkal.art_dp_extern,
gkal.dp_rti_id,
gkal.art_dp_lga,
gkal.set_fix_vkpt
FROM schema1.table_a_dtg gdt
LEFT JOIN schema1.table_a gauf ON gauf.id::text = gdt.au_id::text
LEFT JOIN schema1.auf_oos goftr_1 ON goftr_1.au_id::text = gauf.id::text
LEFT JOIN schema1.table_k gkal ON gkal.oo_id::text = goftr_1.id::text
WHERE gdt.datum >= ('now'::text::date - '7 days'::interval)::date;;
I tried to change the statistics of dp_end_dat and also all of the fields in "where" from 10 to 1500 increased in increments of 10.
One field at once then all fields together. the estimate got not better, actual rows 471, planned rows somewhere between 180000 and 195000.
then i checked the same query on pg 14, the estimate is the same as on pg 18.
ALTER TABLE scema1.table_k ALTER dp_end_dat SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER dp_status SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER oo_id SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_rtd SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt SET STATISTICS 140;
ALTER TABLE
ALTER TABLE scema1.table_k ALTER art_grt_j2j SET STATISTICS 140;
ALTER TABLE
ANALYZE scema1.table_k;
pg14 at statistics 140:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));
Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal (cost=0.28..122750.89 rows=193091 width=614) (actual time=0.010..0.261 rows=471 loops=1)
Index Cond: (dp_end_dat < CURRENT_DATE)
Buffers: shared hit=279
Settings: hash_mem_multiplier = '2.5', jit = 'off', max_parallel_workers = '4', max_parallel_workers_per_gather = '4', random_page_cost = '1.2', temp_buffers = '512MB', work_mem = '768MB'
Planning:
Buffers: shared hit=1459
Planning Time: 3.101 ms
Execution Time: 0.325 ms
pg18 at statistics 140:
EXPLAIN (ANALYZE, BUFFERS, SETTINGS) SELECT * FROM schema1.table_k AS kal WHERE dp_end_dat < current_date AND dp_st_dat IS NOT NULL AND dp_end_dat IS NOT NULL AND dp_status IS NOT NULL AND dp_status > 0 AND oo_id IS NOT NULL AND COALESCE(art_rtd, 0.0000) < (COALESCE(art_grt, 0.0000) + COALESCE(art_grt_j2j, 0.0000));
Index Scan using table_k_late_spec_dp_end_dat_key on table_k kal (cost=0.28..122561.69 rows=195550 width=624) (actual time=0.021..0.514 rows=471.00 loops=1)
Index Cond: (dp_end_dat < CURRENT_DATE)
Index Searches: 1
Buffers: shared hit=279
Settings: temp_buffers = '512MB', work_mem = '768MB', hash_mem_multiplier = '2.5', jit = 'off', max_parallel_workers_per_gather = '4', max_parallel_workers = '4', random_page_cost = '1.2'
Planning:
Buffers: shared hit=1508
Planning Time: 3.123 ms
Execution Time: 0.639 ms
(9 rows)
I hope I have selected the correct parts of the query, as it is not really possible to share the entire query with all its dependencies.
The first version of this query was written for PostgreSQL 8.3; since then, it has been refactored and optimized a few times when necessary. I will check if it is possible to reorder the query without rewriting everything.
I dont know the inner workings of analyze, is that normal that executing analyze on unchanged data can flip the plan? Does analyze select a random set of rows?
Thanks.
regards,
Attila
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Andrei Lepikhov | 2026-02-27 15:35:31 | Re: unstable query plan on pg 16,17,18 |
| Previous Message | Andrei Lepikhov | 2026-02-27 08:15:33 | Re: unstable query plan on pg 16,17,18 |