Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching

From: Pierre-Frédéric Caillaud <lists(at)boutiquenumerique(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Re: Index on TEXT versus CHAR(32)... fast exact TEXT matching
Date: 2004-09-05 17:34:46
Message-ID: opsdu638jlcq72hf@musicbox
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

> CREATE TABLE a (id SERIAL, thetext TEXT, thetext_md5 CHAR(32));
> SELECT id FROM a WHERE thetext_m5d=md5('Some other text');

From your explanations :
- you need an exact match text->id
- you already know the text

Then, why bother with the TEXT column ?
I have no idea on the performance of pg text indexes, but I do know that
a table which fits in RAM, or an index which fits in RAM, is faster than
when it does not.

Try this :

create table my_searches (
id serial primary key,
text_md5 char[16] (or even a binary string)
);

Then create a table my_text (id,full_text) with appropriate foreign key
(to get a 1<->1 relationship between your tables) because you may still
want to remember the text. Index it on id of course.

My point is that your search table no longer has the big text field, it
has only constant size fields, (thus constant rowsize) and it is much
smaller. The index will also be smaller and maybe fit in RAM. You then :

SELECT id FROM my_text WHERE id in (SELECT id FROM my_searches WHERE
text_md5 = (your md5)) AND full_text=(your text);

Unless a MD5 collision occurs, which should be rare, the subquery will
return 1 result and the first SELECT will just do an index lookup on ID.

Now, ok, I'm stupid, I reimplemented the Hash index...

I'd be interested in your benchmarks of your real data with :
- table with full text + id, with btree index on text
- table with full text + id, with hash index on text
- the scheme I just came up with

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pierre-Frédéric Caillaud 2004-09-05 17:39:27 Re: How to determine a database is intact?
Previous Message Tom Lane 2004-09-05 16:44:33 Re: readline selection (was Re: [GENERAL] psql leaking? - SOLVED)