From: | Stuart Bishop <stuart(at)stuartbishop(dot)net> |
---|---|
To: | Matthew Wilson <matt(at)tplus1(dot)com> |
Cc: | pgsql-general(at)postgresql(dot)org |
Subject: | Re: Need help requiring uniqueness in text columns |
Date: | 2008-01-02 02:43:59 |
Message-ID: | 477AFA6F.70602@stuartbishop.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Matthew Wilson wrote:
> I have a table MESSAGE with a text column and a timestamp column. I
> want to make sure that I have no rows with duplicates for both values.
> I have two ideas right now for how to do this.
>
> IDEA 1:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
>
> IDEA 2:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
>
> I am speculating that an index on the md5 is cheaper than on a text
> column. I'm willing to risk the chance of a hash collision.
>
> I don't want to use this index to allow searching inside the text
> column. I just want to protect against duplication.
>
> Are either of these approaches any good? What are other ways to
> guarantee uniqueness for the pair of my timestamp column and my text
> column?
Idea 1 will only work if your text columns never grow too long - I can't
recall what the limit is off the top of my head but PG will issue warnings
when you reach it.
Idea 2 works. You can even use it for exact match searching if you like if
you do WHERE my_timestamp_col='...' AND md5(my_text_col)=md5('...')
Depending on your app, you might want to round your timestamps to avoid
getting two identical messages in the system only a millisecond apart.
--
Stuart Bishop <stuart(at)stuartbishop(dot)net>
http://www.stuartbishop.net/
From | Date | Subject | |
---|---|---|---|
Next Message | Chuck | 2008-01-02 02:49:40 | Re: basic questions: Postgres with yum on CentOS 5.1 |
Previous Message | Matthew Wilson | 2008-01-02 01:00:26 | Need help requiring uniqueness in text columns |