Two "equivalent" WITH RECURSIVE queries, one of them slow.

From: "Octavio Alvarez" <alvarezp(at)alvarezp(dot)ods(dot)org>
To: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Two "equivalent" WITH RECURSIVE queries, one of them slow.
Date: 2010-07-05 06:07:20
Message-ID: op.vfcwmiow4oyyg1@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello.

I have a tree-like table with a three-field PK (name, date, id) and one
parent field.
It has 5k to 6k records as of now, but it will hold about 1 million
records.

I am trying the following WITH RECURSIVE query:

WITH RECURSIVE t AS (
SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
FROM _books.par
UNION
SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
FROM t, _books.par p
WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
)
SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 2547.503 ms

However, if I try the same query but adding the same WHERE clause to the
non-recursive term, I get much better results.

WITH RECURSIVE t AS (
SELECT par.id AS tid, par.name, par.date, par.id,
par.text, par.h_title, par.h_name, par.parent
FROM _books.par WHERE name = 'cfx' AND date =
'2009-08-19' AND par.id = '28340'
UNION
SELECT t.tid AS pid, p.name, p.date, p.id, p.text,
p.h_title, p.h_name, p.parent
FROM t, _books.par p
WHERE p.name = t.name AND p.date = t.date AND t.id =
p.parent
)
SELECT t.tid, t.name, t.date, t.id, t.text, t.h_title, t.h_name, t.parent
FROM t WHERE name = 'cfx' AND date = '2009-08-19' AND tid = '28340';

... which takes 0.221 ms

I am being forced to use the slow query because I want to define it as a
view, leaving the WHERE clause to the application.

I fail to see where the two queries might be different, or, what cases the
slow one considers that the fast one doesn't, as to get a clue on how to
workaround this.

I have taken the EXPLAIN ANALYZE output for both queries. It looks like
the slow one is processing all records (read: not adding the WHERE clause
to the non-recursive term).

QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on t (cost=96653.20..96820.57 rows=1 width=144) (actual
time=32.931..2541.792 rows=1 loops=1)
Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
CTE t
-> Recursive Union (cost=0.00..96653.20 rows=6086 width=212)
(actual time=0.017..2442.655 rows=33191 loops=1)
-> Seq Scan on par (cost=0.00..237.96 rows=5996 width=208)
(actual time=0.011..5.591 rows=5996 loops=1)
-> Merge Join (cost=8909.74..9629.35 rows=9 width=212)
(actual time=225.979..254.727 rows=3022 loops=9)
Merge Cond: (((t.name)::text = (p.name)::text) AND
(t.date = p.date) AND (t.id = p.parent))
-> Sort (cost=7700.54..7850.44 rows=59960 width=44)
(actual time=58.163..59.596 rows=3685 loops=9)
Sort name: t.name, t.date, t.id
Sort Method: quicksort Memory: 17kB
-> WorkTable Scan on t (cost=0.00..1199.20
rows=59960 width=44) (actual time=0.027..3.486 rows=3688 loops=9)
-> Materialize (cost=1209.20..1284.15 rows=5996
width=208) (actual time=163.062..177.415 rows=5810 loops=9)
-> Sort (cost=1209.20..1224.19 rows=5996
width=208) (actual time=163.054..172.543 rows=5810 loops=9)
Sort name: p.name, p.date, p.parent
Sort Method: external merge Disk: 1304kB
-> Seq Scan on par p (cost=0.00..237.96
rows=5996 width=208) (actual time=0.015..3.330 rows=5996 loops=9)
Total runtime: 2547.503 ms
(17 rows)

QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------
CTE Scan on t (cost=927.80..928.10 rows=1 width=144) (actual
time=0.036..0.132 rows=1 loops=1)
Filter: (((name)::text = 'cfx'::text) AND (date = '2009-08-19'::date)
AND (tid = 28340))
CTE t
-> Recursive Union (cost=0.00..927.80 rows=11 width=212) (actual
time=0.030..0.124 rows=1 loops=1)
-> Index Scan using par_id on par (cost=0.00..8.27 rows=1
width=208) (actual time=0.024..0.026 rows=1 loops=1)
Index Cond: (id = 28340)
Filter: (((name)::text = 'cfx'::text) AND (date =
'2009-08-19'::date))
-> Nested Loop (cost=0.00..91.93 rows=1 width=212) (actual
time=0.091..0.091 rows=0 loops=1)
Join Filter: (((t.name)::text = (p.name)::text) AND
(t.date = p.date))
-> WorkTable Scan on t (cost=0.00..0.20 rows=10
width=44) (actual time=0.001..0.001 rows=1 loops=1)
-> Index Scan using par_parent on par p
(cost=0.00..9.07 rows=6 width=208) (actual time=0.085..0.085 rows=0
loops=1)
Index Cond: (p.parent = t.id)
Total runtime: 0.221 ms
(13 rows)

books=# \d _books.par
Table "_books.par"
Column | Type | Modifiers
--------------+-------------------+-----------
name | character varying | not null
date | date | not null
id | integer | not null
text | character varying |
h_title | character varying |
h_name | character varying |
parent | integer |
Indexes:
"par_pkey" PRIMARY KEY, btree (name, date, id)
"par_name" btree (name)
"par_name_fpub_parent" btree (name, date, parent)
"par_id" btree (id)
"par_parent" btree (parent)

$ psql --version
psql (PostgreSQL) 8.4.4
contains support for command-line editing

--
Octavio.

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message damien hostin 2010-07-05 07:28:03 Re: Slow query with planner row strange estimation
Previous Message Craig Ringer 2010-07-04 12:38:36 Re: Performance issues with postgresql-8.4.0