Enforcing unique column with triggers and hash

From: Data Growth Pty Ltd <datagrowth(at)gmail(dot)com>
To: pgsql-general(at)postgresql(dot)org
Subject: Enforcing unique column with triggers and hash
Date: 2010-05-28 02:16:06
Message-ID: AANLkTinkROFxSB5nWSlX8dLfpNStI8pwgm-ftnQebehL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I have a large table (200 million rows) with a column ( 'url' character
varying(255)) that I need to be unique.

Currently I do this via a UNIQUE btree index on (lower(url::text))

The index is huge, and I would like to make it much smaller. Accesses to
the table via this key are a tiny portion of the total (<1%), mainly being
INSERTs which are not time critical. SELECTs very rarely use this column in
WHERE, and never for time-critical queries.

I would also like to partition the table, but this column and index is not a
good choice for the partitioning criteria.

I thought I might be able to create another column ('url_hash'), being a say
64-bit hash of lower(url::text) with a UNIQUE constraint. 64 bits should
give me 36 bits of randomness over my 2^28 rows, making the probability of a
false match sufficiently rare (probably much rarer than the risk of a
programming bug causing a false match).

Alternatively, I could use an even shorter hash (say 32 bits), and allow for
the possibility of hash collisions.

Does anybody know of any reference to using a hash in postgresql as a
substitute for a unique btree? I would like to avoid re-inventing the wheel
if possible.

Stephen

Browse pgsql-general by date

  From Date Subject
Next Message Nilesh Govindarajan 2010-05-28 03:58:24 Re: No lidbl.so in libpq.so (postgresql 8.4.4)
Previous Message Tom Lane 2010-05-28 00:42:09 Re: List traffic