From: | David Fetter <david(at)fetter(dot)org> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | Andreas Karlsson <andreas(at)proxel(dot)se>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Early WIP/PoC for inlining CTEs |
Date: | 2018-10-05 22:43:11 |
Message-ID: | 20181005224310.GR25294@fetter.org |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Oct 05, 2018 at 01:40:05AM +0100, Andrew Gierth wrote:
> >>>>> "Andreas" == Andreas Karlsson <andreas(at)proxel(dot)se> writes:
>
> > On 10/03/2018 05:57 PM, David Fetter wrote:
> >> Is there any meaningful distinction between "inlining," by which I
> >> mean converting to a subquery, and predicate pushdown, which
> >> would happen at least for a first cut, at the rewrite stage?
>
> Yes.
>
> Andreas> Sorry, but I do not think I understand your question. The
> Andreas> ability to push down predicates is just one of the potential
> Andreas> benefits from inlining.
>
> Consider the difference between (in the absence of CTE inlining):
>
> -- inline subquery with no optimization barrier (qual may be pushed down)
> select * from (select x from y) s where x=1;
...and doesn't need to materialize all of y,
> -- inline subquery with optimization barrier (qual not pushed down)
> select * from (select x from y offset 0) s where x=1;
>
> -- CTE with materialization
> with s as (select x from y) select * from s where x=1;
while both of these do. I was interested to discover that on my
synthetic test of 10 million integers from generate_series(1,10000000)
both with and without a b-tree index on x--as expected, the index has
no effect--I consistently get stuff like this:
shackle(at)[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs on, buffers on, timing on) with s as (select x from y) select * from s where x=1;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
CTE Scan on s (cost=144247.77..369247.25 rows=50000 width=4) (actual time=0.213..2287.355 rows=1 loops=1)
Output: s.x
Filter: (s.x = 1)
Rows Removed by Filter: 9999999
Buffers: shared hit=16310 read=27938, temp written=17089
CTE s
-> Seq Scan on public.y (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.208..593.426 rows=10000000 loops=1)
Output: y.x
Buffers: shared hit=16310 read=27938
Planning time: 0.110 ms
Execution time: 2313.682 ms
(11 rows)
shackle(at)[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs on, buffers on, timing on) select * from (select x from y offset 0) s where x=1;
QUERY PLAN
════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on s (cost=0.00..269247.48 rows=1 width=4) (actual time=0.734..1069.012 rows=1 loops=1)
Output: s.x
Filter: (s.x = 1)
Rows Removed by Filter: 9999999
Buffers: shared hit=16316 read=27932
-> Seq Scan on public.y (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.731..539.463 rows=10000000 loops=1)
Output: y.x
Buffers: shared hit=16316 read=27932
Planning time: 0.114 ms
Execution time: 1069.032 ms
(10 rows)
i.e. for this case, the CTE scan takes over 2.3x the time the simple
materialization does. Also, when I boost work_mem to 1GB (256MB wasn't
enough to avoid "temp written"), there's still a 1.8x penalty.
Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778
Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate
From | Date | Subject | |
---|---|---|---|
Next Message | Tatsuo Ishii | 2018-10-05 23:17:04 | Creating Certificates |
Previous Message | Tom Lane | 2018-10-05 21:32:47 | Re: now() vs transaction_timestamp() |