Quick Links

Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)

From:	Robert Haas <robertmhaas(at)gmail(dot)com>
To:	Marinos Yannikos <mjy(at)geizhals(dot)at>
Cc:	pgsql-performance(at)postgresql(dot)org
Subject:	Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)
Date:	2009-04-08 13:53:58
Message-ID:	603c8f070904080653y5ca8ea3buf70651291b9d955f@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Wed, Apr 8, 2009 at 9:42 AM, Marinos Yannikos <mjy(at)geizhals(dot)at> wrote:
> It seems that ANALYZE does not really sample text column values as much as
> it could. We have some very bad query plans resulting from this:
>
> ...
> -> Bitmap Index Scan on m_pkey (cost=0.00..28.61 rows=102 width=0)
> (actual time=171.824..171.824 rows=683923 loops=1)
> Index Cond: ((e >= 'ean'::text) AND (e < 'eao'::text)
>
> This gets even worse for longer strings, where we know that many matching
> rows exist:
>
> # explain analyze select substring(e,5) from m where id=257421 and e ~
> '^ean=';
> QUERY PLAN
> ---------------------------------------------------------------------------------------------------------------------------
> Index Scan using m_pkey on m (cost=0.00..12.50 rows=1 width=60) (actual
> time=1623.795..1703.958 rows=18 loops=1)
> Index Cond: ((e >= 'ean='::text) AND (e < 'ean>'::text))
> Filter: ((e ~ '^ean='::text) AND (id = 257421))
> Total runtime: 1703.991 ms
> (4 rows)
>
> Here it would be much better to use the existing index on "id" (btree) first
> because the current index condition selects 683k rows whereas the result
> contains 18 rows. Using the index on id would yield 97 rows to filter.
>
> Is it possible to work around this problem somehow, other than adding
> partial indexes for the ~ / LIKE condition (when it's constant) or a
> 2-dimensional index?
>
> (what exactly does ANALYZE look at for text columns? in our case, about 7%
> of the rows match the index condition, so it seems that left-anchored
> regexp/like matches are not evaluated using the gathered most-common-value
> list at all)

What are you using for default_statistics_target?

You can see the gathered data in pg_statistic.

...Robert

In response to

bad query plans for ~ "^string" (and like "string%") (8.3.6) at 2009-04-08 13:42:12 from Marinos Yannikos

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Tom Lane	2009-04-08 14:28:09	Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)
Previous Message	Marinos Yannikos	2009-04-08 13:50:58	Re: bad query plans for ~ "^string" (and like "string%") (8.3.6)