From: | Lukas Fittl <lukas(at)fittl(dot)com> |
---|---|
To: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com> |
Subject: | Add estimated hit ratio to Memoize in EXPLAIN to explain cost adjustment |
Date: | 2023-03-05 00:20:59 |
Message-ID: | CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Hi,
I was debugging a planner problem on Postgres 14.4 the other day - and the
involved "bad" plan was including Memoize - though I don't necessarily
think that Memoize is to blame (and this isn't any of the problems recently
fixed in Memoize costing).
However, what I noticed whilst trying different ways to fix the plan, is
that the Memoize output was a bit hard to reason about - especially since
the plan involving Memoize was expensive to run, and so I was mostly
running EXPLAIN without ANALYZE to look at the costing.
Here is an example of the output I was looking at:
-> Nested Loop (cost=1.00..971672.56 rows=119623 width=0)
-> Index Only Scan using table1_idx on table1
(cost=0.43..372676.50 rows=23553966 width=8)
-> Memoize (cost=0.57..0.61 rows=1 width=8)
Cache Key: table1.table2_id
Cache Mode: logical
-> Index Scan using table2_idx on table2
(cost=0.56..0.60 rows=1 width=8)
Index Cond: (id = table1.table2_id)
The other plan I was comparing with (that I wanted the planner to choose
instead), had a total cost of 1,451,807.35 -- and so I was trying to figure
out why the Nested Loop was costed as 971,672.56.
Simple math makes me expect the Nested Loop should roughly have a total
cost of14,740,595.76 here (372,676.50 + 23,553,966 * 0.61), ignoring a lot
of the smaller costs. Thus, in this example, it appears Memoize made the
plan cost significantly cheaper (roughly 6% of the regular cost).
Essentially this comes down to the "cost reduction" performed by Memoize
only being implicitly visible in the Nested Loop's total cost - and with
nothing useful on the Memoize node itself - since the rescan costs are not
shown.
I think explicitly adding the estimated cache hit ratio for Memoize nodes
might make this easier to reason about, like this:
-> Memoize (cost=0.57..0.61 rows=1 width=8)
Cache Key: table1.table2_id
Cache Mode: logical
Cache Hit Ratio Estimated: 0.94
Alternatively (or in addition) we could consider showing the "ndistinct"
value that is calculated in cost_memoize_rescan - since that's the most
significant contributor to the cache hit ratio (and you can influence that
directly by improving the ndistinct statistics).
See attached a patch that implements showing the cache hit ratio as a
discussion starter.
I'll park this in the July commitfest for now.
Thanks,
Lukas
--
Lukas Fittl
Attachment | Content-Type | Size |
---|---|---|
v1-0001-Add-Estimated-Cache-Hit-Ratio-for-Memoize-plan-no.patch | application/octet-stream | 5.7 KB |
From | Date | Subject | |
---|---|---|---|
Next Message | Dave Cramer | 2023-03-05 00:39:23 | Re: Request for comment on setting binary format output per session |
Previous Message | David G. Johnston | 2023-03-05 00:13:13 | Re: Request for comment on setting binary format output per session |