Estimation problem with a LIKE clause containing a /

From: "Guillaume Smet" <guillaume(dot)smet(at)gmail(dot)com>
To: pgsql-performance <pgsql-performance(at)postgresql(dot)org>
Subject: Estimation problem with a LIKE clause containing a /
Date: 2007-11-07 12:53:16
Message-ID: 1d4e0c10711070453q214f089cpd144cbb5193d3c4f@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-performance

Hi all,

While studying a query taking forever after an ANALYZE on a never
analyzed database (a bad estimate causes a nested loop on a lot of
tuples), I found the following problem:
- without any stats (I removed the line from pg_statistic):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
--------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=114 width=587) (actual
time=0.008..21.692 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.097 ms
-> the estimate is bad (it's expected) but it's sufficient to prevent
the nested loop so it's my current workaround

- after analyzing the cms_items table (statistics is set to 10 but
it's exactly the same for 100):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062/%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*1* width=103) (actual
time=0.010..22.024 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062/%'::text)
Total runtime: 31.341 ms
-> this estimate leads PostgreSQL to choose a nested loop which is
executed more than 11k times and causes the query to take forever.

- if I remove the / from the LIKE clause (which I can't as ancestors
is more or less a path):
ccm_prod_20071106=# explain analyze select * from cms_items where
ancestors LIKE '1062%';
QUERY PLAN
---------------------------------------------------------------------------------------------------------------
Seq Scan on cms_items (cost=0.00..689.26 rows=*9097* width=103)
(actual time=0.043..25.251 rows=11326 loops=1)
Filter: ((ancestors)::text ~~ '1062%'::text)
Total runtime: 34.778 ms

Which is a really good estimate.

Is it something expected?

The histogram does contain values beginning with '1062/' (5 out of 10)
and the cms_items table has ~ 22k rows.

Version is PostgreSQL 8.1.8 on i686-redhat-linux-gnu, compiled by GCC
gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3). I checked the release
notes between 8.1.8 and 8.1.10 and I didn't find anything relevant to
fix this problem.

Thanks for any help.

Regards,

--
Guillaume

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2007-11-07 13:24:14 Re: pg_resetxlog output clarification
Previous Message Stefan Kaltenbrunner 2007-11-07 12:42:14 Re: tribble.postgresql.org - planned maintenance downtime

Browse pgsql-performance by date

  From Date Subject
Next Message Alexander Staubo 2007-11-07 13:25:40 Re: Estimation problem with a LIKE clause containing a /
Previous Message Jens-Wolfhard Schicke 2007-11-06 20:48:12 Subpar Execution Plan