Re: CTE materializing sets?

From: Виктор Егоров <vyegorov(at)gmail(dot)com>
To: Serge Fonville <serge(dot)fonville(at)gmail(dot)com>
Cc: Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, Liam Caffrey <liam(dot)caffrey(at)gmail(dot)com>, pgsql-general(at)postgresql(dot)org
Subject: Re: CTE materializing sets?
Date: 2012-10-09 10:31:32
Message-ID: CAGnEbohkbvxQegibq5SZj+qoD_JERWNbq7xcnihqf+wbcjG-Kw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

2012/10/9 Serge Fonville <serge(dot)fonville(at)gmail(dot)com>:
> This indeed is a very interesting question.
>
> At http://wiki.postgresql.org/wiki/CTEReadme it seems to suggest that a CTE
> is just rewritten and the resulting query is executed.

As was mentioned a couple of times in this list, CTE do have
optimization fence feature (per SQL Standard).
I asked on the #postgresql channel and was pointed, that typically you
get benefits of this feature
when you have to join grouping subquery to itself.

I went and did some tests. Table "attempt" contains e-mail delivery
attempts for the postfix:

# select relname,relpages,reltuples::numeric,pg_size_pretty(pg_relation_size(oid))
from pg_class where relname='attempt';
relname | relpages | reltuples | pg_size_pretty
---------+----------+-----------+----------------
attempt | 145117 | 4252530 | 1134 MB

My default work_mem is 1MB on this instance.

First, plain query with 2 subqueries:

# explain (analyze, buffers)
select a.eid, b.eid from
(select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) a,
(select recipient_email_id eid, count(*) cnt, min(tstamp) as minmsg,
max(tstamp) as maxmsg from attempt group by recipient_email_id) b
where a.minmsg = b.maxmsg;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=1861911.11..1953183.16 rows=6067386 width=16)
(actual time=65758.378..66115.400 rows=59845 loops=1)
Merge Cond: (a.minmsg = b.maxmsg)
Buffers: shared hit=1590 read=288644, temp read=103129 written=103134
-> Sort (cost=930955.56..931042.64 rows=34835 width=16) (actual
time=30242.503..30370.379 rows=212434 loops=1)
Sort Key: a.minmsg
Sort Method: external merge Disk: 5400kB
Buffers: shared hit=779 read=144338, temp read=51481 written=51481
-> Subquery Scan on a (cost=873875.76..927729.06 rows=34835
width=16) (actual time=26744.434..30008.996 rows=212434 loops=1)
Buffers: shared hit=779 read=144338, temp read=50561
written=50561
-> GroupAggregate (cost=873875.76..927380.71
rows=34835 width=16) (actual time=26744.433..29951.390 rows=212434
loops=1)
Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
-> Sort (cost=873875.76..884507.08 rows=4252528
width=16) (actual time=26744.273..28296.850 rows=4255749 loops=1)
Sort Key: public.attempt.recipient_email_id
Sort Method: external merge Disk: 108168kB
Buffers: shared hit=779 read=144338, temp
read=50561 written=50561
-> Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.010..13618.612 rows=4255749 loops=1)
Buffers: shared hit=779 read=144338
-> Materialize (cost=930955.56..931129.73 rows=34835 width=16)
(actual time=35515.860..35640.974 rows=214271 loops=1)
Buffers: shared hit=811 read=144306, temp read=51648 written=51653
-> Sort (cost=930955.56..931042.64 rows=34835 width=16)
(actual time=35515.853..35586.598 rows=210800 loops=1)
Sort Key: b.maxmsg
Sort Method: external merge Disk: 5384kB
Buffers: shared hit=811 read=144306, temp read=51648
written=51653
-> Subquery Scan on b (cost=873875.76..927729.06
rows=34835 width=16) (actual time=31879.743..35251.218 rows=212434
loops=1)
Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
-> GroupAggregate (cost=873875.76..927380.71
rows=34835 width=16) (actual time=31879.741..35184.965 rows=212434
loops=1)
Buffers: shared hit=811 read=144306, temp
read=50561 written=50561
-> Sort (cost=873875.76..884507.08
rows=4252528 width=16) (actual time=31879.577..33460.975 rows=4255749
loops=1)
Sort Key: public.attempt.recipient_email_id
Sort Method: external merge Disk: 108168kB
Buffers: shared hit=811 read=144306,
temp read=50561 written=50561
-> Seq Scan on attempt
(cost=0.00..187642.28 rows=4252528 width=16) (actual
time=0.012..17637.516 rows=4255749 loops=1)
Buffers: shared hit=811 read=144306
Total runtime: 67611.657 ms
(34 rows)

The source relation is scanned twice. Now, using CTE and it's
materialization feature:

# explain (analyze, buffers)
with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp)
as minmsg, max(tstamp) as maxmsg from attempt group by
recipient_email_id)
select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=935227.10..1026499.15 rows=6067386 width=16)
(actual time=41104.560..41414.873 rows=59845 loops=1)
Merge Cond: (a.minmsg = b.maxmsg)
Buffers: shared hit=747 read=144370, temp read=53659 written=53663
CTE msgs
-> GroupAggregate (cost=873875.76..927380.71 rows=34835
width=16) (actual time=36786.597..40186.758 rows=212434 loops=1)
Buffers: shared hit=747 read=144370, temp read=50561 written=50561
-> Sort (cost=873875.76..884507.08 rows=4252528 width=16)
(actual time=36786.430..38434.272 rows=4255749 loops=1)
Sort Key: attempt.recipient_email_id
Sort Method: external merge Disk: 108168kB
Buffers: shared hit=747 read=144370, temp read=50561
written=50561
-> Seq Scan on attempt (cost=0.00..187642.28
rows=4252528 width=16) (actual time=0.028..23059.290 rows=4255749
loops=1)
Buffers: shared hit=747 read=144370
-> Sort (cost=3923.20..4010.28 rows=34835 width=16) (actual
time=40756.464..40836.892 rows=212434 loops=1)
Sort Key: a.minmsg
Sort Method: external merge Disk: 5400kB
Buffers: shared hit=747 read=144370, temp read=51481 written=52570
-> CTE Scan on msgs a (cost=0.00..696.70 rows=34835
width=16) (actual time=36786.604..40439.648 rows=212434 loops=1)
Buffers: shared hit=747 read=144370, temp read=50561
written=51650
-> Materialize (cost=3923.20..4097.37 rows=34835 width=16)
(actual time=348.080..473.192 rows=214271 loops=1)
Buffers: temp read=2178 written=1093
-> Sort (cost=3923.20..4010.28 rows=34835 width=16) (actual
time=348.073..418.229 rows=210800 loops=1)
Sort Key: b.maxmsg
Sort Method: external merge Disk: 5384kB
Buffers: temp read=2178 written=1093
-> CTE Scan on msgs b (cost=0.00..696.70 rows=34835
width=16) (actual time=0.055..75.441 rows=212434 loops=1)
Buffers: temp read=1091 written=1
Total runtime: 43122.214 ms
(27 rows)

The source relation is scanned only once, but system does a lot of IO
with temp files.
Now small tweak and another try:

# set work_mem = '300MB';
SET
# explain (analyze, buffers)
with msgs as (select recipient_email_id eid, count(*) cnt, min(tstamp)
as minmsg, max(tstamp) as maxmsg from attempt group by
recipient_email_id)
select a.eid, b.eid from msgs a, msgs b where a.minmsg=b.maxmsg;
QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------
Merge Join (cost=237165.30..328350.27 rows=6067386 width=16) (actual
time=31494.137..31768.606 rows=59845 loops=1)
Merge Cond: (a.minmsg = b.maxmsg)
Buffers: shared hit=843 read=144274
CTE msgs
-> HashAggregate (cost=230167.56..230515.91 rows=34835
width=16) (actual time=30396.065..30520.867 rows=212434 loops=1)
Buffers: shared hit=843 read=144274
-> Seq Scan on attempt (cost=0.00..187642.28 rows=4252528
width=16) (actual time=0.077..25349.466 rows=4255749 loops=1)
Buffers: shared hit=843 read=144274
-> Sort (cost=3324.70..3411.78 rows=34835 width=16) (actual
time=31115.205..31247.966 rows=212434 loops=1)
Sort Key: a.minmsg
Sort Method: quicksort Memory: 16102kB
Buffers: shared hit=843 read=144274
-> CTE Scan on msgs a (cost=0.00..696.70 rows=34835
width=16) (actual time=30396.129..30780.369 rows=212434 loops=1)
Buffers: shared hit=843 read=144274
-> Sort (cost=3324.70..3411.78 rows=34835 width=16) (actual
time=377.917..424.277 rows=214271 loops=1)
Sort Key: b.maxmsg
Sort Method: quicksort Memory: 16102kB
-> CTE Scan on msgs b (cost=0.00..696.70 rows=34835
width=16) (actual time=0.005..205.626 rows=212434 loops=1)
Total runtime: 31816.000 ms
(19 rows)

As long as I read the output, source table is scanned only once, all
further operations are done in RAM.
P.S. Not a perfect query, but shows the feature.

--
Victor Y. Yegorov

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Willy-Bas Loos 2012-10-09 12:10:26 something better than pgtrgm?
Previous Message Craig Ringer 2012-10-09 10:10:24 Re: CTE materializing sets?