LIKE query problem

From: Marc McIntyre <mmcintyre(at)squiz(dot)net>
To: pgsql-performance(at)postgresql(dot)org
Subject: LIKE query problem
Date: 2006-09-18 23:48:10
Message-ID: 450F303A.6090603@squiz.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

I'm having a problem with a simple query, that finds children of a node,
using a materialized path to the node. The query:

select n1.id
from nodes n1, nodes n2
where n1.path like n2.path || '%'
and n2.id = 14;

QUERY
PLAN
-------------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.00..120256.56 rows=17517 width=4) (actual
time=0.901..953.485 rows=7 loops=1)
Join Filter: (("inner".path)::text ~~ (("outer".path)::text ||
'%'::text))
-> Index Scan using nodes_id on nodes n2 (cost=0.00..35.08 rows=11
width=34) (actual time=0.050..0.059 rows=1 loops=1)
Index Cond: (id = 14)
-> Seq Scan on nodes n1 (cost=0.00..6151.89 rows=318489 width=38)
(actual time=0.010..479.479 rows=318489 loops=1)
Total runtime: 953.551 ms
(6 rows)

I've tried re-writing the query, which results in a different plan:

select id
from nodes
where path like (
select path
from nodes
where id = 14
limit 1
) || '%';

QUERY
PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on nodes (cost=3.19..7747.52 rows=1592 width=4) (actual
time=0.230..226.311 rows=7 loops=1)
Filter: ((path)::text ~~ (($0)::text || '%'::text))
InitPlan
-> Limit (cost=0.00..3.19 rows=1 width=34) (actual
time=0.018..0.019 rows=1 loops=1)
-> Index Scan using nodes_id on nodes (cost=0.00..35.08
rows=11 width=34) (actual time=0.016..0.016 rows=1 loops=1)
Index Cond: (id = 14)
Total runtime: 226.381 ms
(7 rows)

While the plan looks a little better, the estimated rows are woefully
inaccurate for some reason, resulting in a seq scan on nodes.
If I perform the nested select in the second query separately, then use
the result in the outer select, it's extremely fast:

test=# select path from nodes where id = 14;
path
--------
/3/13/
(1 row)

Time: 0.555 ms

test=# select id from nodes where path like '/3/13/%';
id
---------
14
169012
15
16
17
169219
169220
(7 rows)

Time: 1.062 ms

I've vacuum full analyzed. PG version is 8.1.4

The nodes table is as follows:

test=# \d nodes
Table "public.nodes"
Column | Type | Modifiers
--------+-------------------------+-----------
id | integer | not null
path | character varying(2000) | not null
depth | integer | not null
Indexes:
"nodes_pkey" PRIMARY KEY, btree (id, path)
"nodes_id" btree (id)
"nodes_path" btree (path)

test# select count(*) from nodes;
count
--------
318489

Is there a way to perform this efficiently in one query ?

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Michael Stone 2006-09-18 23:58:13 Re: Large tables (was: RAID 0 not as fast as expected)
Previous Message Alex Turner 2006-09-18 23:14:56 Re: Large tables (was: RAID 0 not as fast as expected)