Re: LIKE and REGEX optimization

From: Chris Browne <cbbrowne(at)acm(dot)org>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: LIKE and REGEX optimization
Date: 2008-01-15 20:29:49
Message-ID: 60wsqasugy.fsf@dba2.int.libertyrms.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Kico Zaninetti <kicozaninetti(at)gmail(dot)com> writes:
> Hi all.
>
> I have a database with 62 million registers and I have to make a
> SELECT using LIKE.
>
> This is my select:
> SELECT * FROM phone WHERE name LIKE = '%ZANINETTI%' AND city = 'SAO
> PAULO' AND state = 'SP'
>
> I have an index created like this:
> CREATE INDEX "telefones_idx2" ON "public"."phone"
> USING btree ("name" varchar_pattern_ops, "city", "state");
>
> When I explain the query I have this:
> QUERY PLAN
> Bitmap Heap Scan on telefones (cost=1031528.27..2726942.75 rows=4
> width=145)
> Recheck Cond: (((city)::text = 'SAO PAULO'::text) AND ((state)::text
> = 'SP'::text))
> Filter: ((name)::text ~~ '%ZANINETTI%'::text)
> -> Bitmap Index Scan on telefones_idx2 (cost=0.00..1031528.27
> rows=1712760 width=0)
> Index Cond: (((city)::text = 'SAO PAULO'::text) AND
> ((state)::text = 'SP'::text))
>
>
> The cost is over than 1 million! It's to high and I have to reduce it.
> Does someone know how can I make it?

You may be misreading what the cost is...

It looks to me like the planner is estimating that there are 1712760
rows where city = 'SAO PAULO' and state = 'SP'; is that estimate way
off?

If it is, then maybe you need to ANALYZE the table, perhaps combined
with altering the histogram size for city/state.

(e.g. - alter table phone alter column city set statistics 100;
alter table phone alter column state set statistics 100;)

If altering statistics, then re-ANALYZEing the table helps, that may
resolve things.

Unfortunately, the LIKE clause isn't going to readily take advantage
of regular indices, because you have put in '%' at beginning and end,
which means that regular indices will not be usable.

I wouldn't be too surprised if that is the case; Sao Paulo is about
the most populous South American city, with ~11M people, so having a
database with 1.7M phone numbers in that city does not seem "out
there."

The only idea that comes to mind to follow that is to look into
tsearch. On PostgreSQL 8.3, it's built in; on 8.2 and earlier, it's a
"contrib" module. It allows indexing on words inside columns, which
would seem to fit your requirement.
--
select 'cbbrowne' || '@' || 'linuxfinances.info';
http://cbbrowne.com/info/sap.html
Signs of a Klingon Programmer #3: "By filing this TPR you have
challenged the honor of my family. Prepare to die!"

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2008-01-15 20:30:04 Re: Segmentation fault with 8.3 FTS ISpell
Previous Message Tom Lane 2008-01-15 20:04:49 Re: out of memory with INSERT INTO... SELECT...