performance regression in 9.2 CTE with SRF function

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: performance regression in 9.2 CTE with SRF function
Date: 2013-02-09 08:52:00
Message-ID: CAFj8pRAfaCBEPfo4Zpyck1wVgShboXWA2VEY+=rGfeLDgUQCBw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

In Czech discussion group was reported performance regression of CTE
query. I wrote a test, when I can show it.

Execution on 9.2, 9.3 is about 9 sec, on 8.4, 9.1 is about 30ms with
default work_mem ='1MB'

optimal plan (8.4)
QUERY PLAN
----------------------------------------------------------------------------------------------------------------
Hash Join (cost=45.54..69.32 rows=441 width=4) (actual
time=29.943..29.943 rows=0 loops=1)
Output: x2.a
Hash Cond: (x2.a = pl.a)
CTE pl
-> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual
time=6.723..7.016 rows=89 loops=1)
Output: x.a
-> Seq Scan on x (cost=0.00..15.00 rows=1000 width=4)
(actual time=0.026..3.390 rows=1000 loops=1)
Output: x.a
-> Seq Scan on x2 (cost=24.25..41.75 rows=500 width=4) (actual
time=21.178..21.757 rows=131 loops=1)
Output: x2.a, x2.b
Filter: (NOT (hashed SubPlan 2))
SubPlan 2
-> CTE Scan on pl (cost=0.00..24.03 rows=89 width=4)
(actual time=0.208..18.821 rows=1000 loops=1)
Output: foo(pl.a)
-> Hash (cost=1.78..1.78 rows=89 width=4) (actual
time=7.895..7.895 rows=89 loops=1)
Output: pl.a
-> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual
time=6.733..7.592 rows=89 loops=1)
Output: pl.a
Total runtime: 30.046 ms

suboptimal plan (9.3)
QUERY PLAN
────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Nested Loop (cost=18.39..346248.92 rows=441 width=4) (actual
time=9373.295..9373.295 rows=0 loops=1)
Output: x2.a
Join Filter: (x2.a = pl.a)
Rows Removed by Join Filter: 11837
CTE pl
-> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual
time=4.363..4.539 rows=89 loops=1)
Output: x.a
-> Seq Scan on public.x (cost=0.00..15.00 rows=1000
width=4) (actual time=0.031..2.133 rows=1000 loops=1)
Output: x.a
-> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual
time=4.371..4.519 rows=89 loops=1)
Output: pl.a
-> Materialize (cost=0.00..345562.50 rows=500 width=4) (actual
time=0.203..105.051 rows=133 loops=89)
Output: x2.a
-> Seq Scan on public.x2 (cost=0.00..345560.00 rows=500
width=4) (actual time=17.914..9330.645 rows=133 loops=1)
Output: x2.a
Filter: (NOT (SubPlan 2))
Rows Removed by Filter: 867
SubPlan 2
-> CTE Scan on pl pl_1 (cost=0.00..468.59
rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000)
Output: foo(pl_1.a)
Total runtime: 9373.444 ms

Higher work_mem 3MB fixes this issue
QUERY PLAN
─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
Hash Join (cost=712.37..736.15 rows=441 width=4) (actual
time=38.024..38.024 rows=0 loops=1)
Output: x2.a
Hash Cond: (x2.a = pl.a)
CTE pl
-> HashAggregate (cost=17.50..18.39 rows=89 width=4) (actual
time=8.457..8.759 rows=89 loops=1)
Output: x.a
-> Seq Scan on public.x (cost=0.00..15.00 rows=1000
width=4) (actual time=0.029..5.073 rows=1000 loops=1)
Output: x.a
-> Seq Scan on public.x2 (cost=691.09..708.59 rows=500 width=4)
(actual time=27.629..28.152 rows=105 loops=1)
Output: x2.a, x2.b
Filter: (NOT (hashed SubPlan 2))
Rows Removed by Filter: 895
SubPlan 2
-> CTE Scan on pl pl_1 (cost=0.00..468.59 rows=89000
width=4) (actual time=0.328..24.224 rows=1000 loops=1)
Output: foo(pl_1.a)
-> Hash (cost=1.78..1.78 rows=89 width=4) (actual
time=9.650..9.650 rows=89 loops=1)
Output: pl.a
Buckets: 1024 Batches: 1 Memory Usage: 3kB
-> CTE Scan on pl (cost=0.00..1.78 rows=89 width=4) (actual
time=8.468..9.346 rows=89 loops=1)
Output: pl.a
Total runtime: 38.153 ms

There is terribly wrong estimation when work_mem is lower

SubPlan 2
-> CTE Scan on pl pl_1 (cost=0.00..468.59
rows=89000 width=4) (actual time=0.023..8.379 rows=566 loops=1000)
Output: foo(pl_1.a)

tables x and x2 are wery small - 64KB

link to original thread - sorry it is in Czech language (but attached
plans are in English :))
http://groups.google.com/group/postgresql-cz/browse_thread/thread/75e0e34e88950802?hl=cs

Regards
Pavel Stehule

Attachment Content-Type Size
test-with.sql application/octet-stream 741 bytes

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Miroslav Šimulčík 2013-02-09 09:13:23 Re: function for setting/getting same timestamp during whole transaction
Previous Message Erik Rijkers 2013-02-09 08:20:35 Re: pgsql: psql: Improve expanded print output in tuples-only mode