| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | Andrei Lepikhov <lepihov(at)gmail(dot)com> |
| Cc: | Mauro Gatti <mauro(dot)gt70(at)gmail(dot)com>, pgsql-performance(at)lists(dot)postgresql(dot)org |
| Subject: | Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) |
| Date: | 2026-03-05 21:44:38 |
| Message-ID: | 14531.1772747078@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-performance |
Andrei Lepikhov <lepihov(at)gmail(dot)com> writes:
> On 5/3/26 17:25, Mauro Gatti wrote:
>> ## Questions for the community
> Thanks for stable reproduction!
Yes, we appreciate that much effort being put into trouble reports.
Makes it a lot easier to see what's going wrong.
> Your case is typical for 'never executed' nodes. As you can see, the
> costs of your query plans are very close, and the estimation error is
> large due to multiple clauses in your filter. As I see, for the planner,
> there is no difference in which version of the plan to choose - it is
> just a game of chance.
Right. Given the very-far-off rowcount estimates for some of the
index scans, it'd be surprising if the planner arrived at a good
join order. It's a "garbage in, garbage out" situation. As Andrei
suggested, you can often improve bad rowcount estimates by creating
custom statistics.
I found it was sufficient to do
=# create statistics on brand_id,line_code,model_year,model_code,version_code,pricelist_id from pricelist_options;
CREATE STATISTICS
=# analyze pricelist_options;
ANALYZE
That doesn't result in fully accurate estimates:
=# explain analyze select * from pricelist_options pl where ((pl.brand_id = 10) AND ((pl.line_code)::text = 'ABC'::text) AND ((pl.model_year)::text = 'YR'::text) AND ((pl.model_code)::text = 'ABC'::text) AND ((pl.version_code)::text = 'VER-001'::text) AND (pl.pricelist_id = 100));
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Index Scan using ix_pricelist_options_1 on pricelist_options pl (cost=0.43..117.12 rows=28 width=106) (actual time=0.015..0.037 rows=165.00 loops=1)
Index Cond: ((brand_id = 10) AND ((line_code)::text = 'ABC'::text) AND ((model_year)::text = 'YR'::text) AND ((model_code)::text = 'ABC'::text) AND ((version_code)::text = 'VER-001'::text) AND (pricelist_id = 100))
but "28 rows" is a lot closer to 165 than "1 row", and it's enough
to push the planner to choose the plan you want.
I do concur with Andrei's recommendation to create stats matching
your other multicolumn indexes, though.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Mauro Gatti | 2026-03-06 08:24:03 | Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) |
| Previous Message | Andrei Lepikhov | 2026-03-05 19:09:06 | Re: Planner join order regression from PG 15 to PG 16+: 70ms -> 1440ms (self-contained reproducer included) |