Subplan result caching

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Subplan result caching
Date: 2018-05-23 09:31:40
Message-ID: daceb327-9a20-51f4-fe6c-60b898692305@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I've been working on a patch to add a little cache to SubPlans, to speed
up queries with correlated subqueries, where the same subquery is
currently executed multiple times with the same parameters. The idea is
to cache the result of the subplan, with the correlation vars as the
cache key.

That helps a lot, if you happen to have that kind of a query. I bumped
into this while looking at TPC-DS query 6:

select a.ca_state state, count(*) cnt
from customer_address a
,customer c
,store_sales s
,date_dim d
,item i
where a.ca_address_sk = c.c_current_addr_sk
and c.c_customer_sk = s.ss_customer_sk
and s.ss_sold_date_sk = d.d_date_sk
and s.ss_item_sk = i.i_item_sk
and d.d_month_seq =
(select distinct (d_month_seq)
from date_dim
where d_year = 2000
and d_moy = 5 )
and i.i_current_price > 1.2 *
(select avg(j.i_current_price)
from item j
where j.i_category = i.i_category)
group by a.ca_state
having count(*) >= 10
order by cnt

The first subquery is uncorrelated, and is already handled efficiently
as an InitPlan. This patch helps with the second subquery. There are
only 11 different categories, but we currently re-execute it for every
row of the outer query, over 26000 times. (I think I have about 1 GB of
data in my little database I've been testing with, I'm not sure how this
would scale with the amount of data.) With this patch, it's only
executed 11 times, the cache avoids the rest of the executions. That
brings the runtime, on my laptop, from about 30 s to 120 ms.

For this particular query, I actually wish we could pull up that
subquery, instead. I did some investigation into that last summer,
https://www.postgresql.org/message-id/67e353e8-c20e-7980-a282-538779edf25b%40iki.fi,
but that's a much bigger project. In any case, even if the planner was
able to pull up subqueries in more cases, a cache like this would still
be helpful for those cases where pulling up was still not possible.

Thoughts?

- Heikki

Attachment Content-Type Size
0001-Add-comment-to-explain-that-SubPlan-can-return-its-r.patch text/x-patch 1.4 KB
0002-Remove-some-unused-code.patch text/x-patch 2.8 KB
0003-Add-a-Param-result-cache-to-SubPlan-nodes.patch text/x-patch 24.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Laurenz Albe 2018-05-23 10:03:30 Re: Subplan result caching
Previous Message Magnus Hagander 2018-05-23 09:15:28 Re: SCRAM with channel binding downgrade attack