Re: PATCH: CITEXT 2.0 v2

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0 v2
Date: 2008-07-07 23:24:27
Message-ID: 551B62DD-77F5-4CA3-9C6E-14E38A64EF26@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 7, 2008, at 08:01, Andrew Dunstan wrote:

> What does still bother me is its performance. I'd like to know if
> any measurement has been done of using citext vs. a functional index
> on lower(foo).

Okay, here's a start. The attached script inserts random strings of
1-10 space-delimited words into text and citext columns, and then
compares the performance of queries with and without indexes. The
output for me is as follows:

Loading words from dictionary.
Inserting into the table.

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 254.254 ms
SELECT * FROM try WHERE citext = 'food';
Time: 288.535 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.385 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 236.186 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 235.818 ms

Adding indexes...

Test =.
SELECT * FROM try WHERE LOWER(text) = LOWER('food');
Time: 1.260 ms
SELECT * FROM try WHERE citext = 'food';
Time: 277.755 ms

Test LIKE and ILIKE
SELECT * FROM try WHERE LOWER(text) LIKE LOWER('C%');
Time: 209.073 ms
SELECT * FROM try WHERE citext ILIKE 'C%';
Time: 238.430 ms
SELECT * FROM try WHERE citext LIKE 'C%';
Time: 238.685 ms
benedict%

So for some reason, after adding the indexes, the queries against the
CITEXT column aren't using them. Furthermore, the `lower(text) LIKE
lower(?)` query isn't using *its* index. Huh?

So this leaves me with two questions:

1. For what reason would the query against the citext column *not* use
the index?

2. Is there some way to get the CITEXT index to behave like a LOWER()
index, that is, so that its value is stored using the result of the
str_tolower() function, thus removing some of the overhead of
converting the values for each row fetched from the index? (Does this
question make any sense?)

Thanks,

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-07-07 23:26:36 Re: PATCH: CITEXT 2.0 v2
Previous Message David E. Wheeler 2008-07-07 23:09:18 Re: \SET QUIET and \timing