Re: Early WIP/PoC for inlining CTEs

From: David Fetter <david(at)fetter(dot)org>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: Andreas Karlsson <andreas(at)proxel(dot)se>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Early WIP/PoC for inlining CTEs
Date: 2018-10-05 22:43:11
Message-ID: 20181005224310.GR25294@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 05, 2018 at 01:40:05AM +0100, Andrew Gierth wrote:
> >>>>> "Andreas" == Andreas Karlsson <andreas(at)proxel(dot)se> writes:
>
> > On 10/03/2018 05:57 PM, David Fetter wrote:
> >> Is there any meaningful distinction between "inlining," by which I
> >> mean converting to a subquery, and predicate pushdown, which
> >> would happen at least for a first cut, at the rewrite stage?
>
> Yes.
>
> Andreas> Sorry, but I do not think I understand your question. The
> Andreas> ability to push down predicates is just one of the potential
> Andreas> benefits from inlining.
>
> Consider the difference between (in the absence of CTE inlining):
>
> -- inline subquery with no optimization barrier (qual may be pushed down)
> select * from (select x from y) s where x=1;

...and doesn't need to materialize all of y,

> -- inline subquery with optimization barrier (qual not pushed down)
> select * from (select x from y offset 0) s where x=1;
>
> -- CTE with materialization
> with s as (select x from y) select * from s where x=1;

while both of these do. I was interested to discover that on my
synthetic test of 10 million integers from generate_series(1,10000000)
both with and without a b-tree index on x--as expected, the index has
no effect--I consistently get stuff like this:

shackle(at)[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs on, buffers on, timing on) with s as (select x from y) select * from s where x=1;
QUERY PLAN
══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
CTE Scan on s (cost=144247.77..369247.25 rows=50000 width=4) (actual time=0.213..2287.355 rows=1 loops=1)
Output: s.x
Filter: (s.x = 1)
Rows Removed by Filter: 9999999
Buffers: shared hit=16310 read=27938, temp written=17089
CTE s
-> Seq Scan on public.y (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.208..593.426 rows=10000000 loops=1)
Output: y.x
Buffers: shared hit=16310 read=27938
Planning time: 0.110 ms
Execution time: 2313.682 ms
(11 rows)

shackle(at)[local]:5432/shackle(10.5)(18539) > explain (analyze, verbose, costs on, buffers on, timing on) select * from (select x from y offset 0) s where x=1;
QUERY PLAN
════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════
Subquery Scan on s (cost=0.00..269247.48 rows=1 width=4) (actual time=0.734..1069.012 rows=1 loops=1)
Output: s.x
Filter: (s.x = 1)
Rows Removed by Filter: 9999999
Buffers: shared hit=16316 read=27932
-> Seq Scan on public.y (cost=0.00..144247.77 rows=9999977 width=4) (actual time=0.731..539.463 rows=10000000 loops=1)
Output: y.x
Buffers: shared hit=16316 read=27932
Planning time: 0.114 ms
Execution time: 1069.032 ms
(10 rows)

i.e. for this case, the CTE scan takes over 2.3x the time the simple
materialization does. Also, when I boost work_mem to 1GB (256MB wasn't
enough to avoid "temp written"), there's still a 1.8x penalty.

Best,
David.
--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2018-10-05 23:17:04 Creating Certificates
Previous Message Tom Lane 2018-10-05 21:32:47 Re: now() vs transaction_timestamp()