Re: Hash Function: MD5 or other?

From: Alex Stapleton <alexs(at)advfn(dot)com>
To: Peter Fein <pfein(at)pobox(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Hash Function: MD5 or other?
Date: 2005-06-14 09:06:49
Message-ID: 34DEB1AE-95EE-4462-8E51-9FEE07A9C565@advfn.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general


On 13 Jun 2005, at 23:49, Peter Fein wrote:

> Hi-
>
> I wanted to use a partially unique index (dependent on a flag) on a
> TEXT
> column, but the index row size was too big for btrees. See the thread
> "index row size 2728 exceeds btree maximum, 2713" from the
> beginning of
> this month for someone with a similar problem. In it, someone
> suggests
> indexing on a hash of the text. I'm fine with this, as the texts in
> question are similar enough to each other to make collisions unlikely
> and a collision won't really cause any serious problems.
>
> My question is: is the builtin MD5 appropriate for this use or
> should I
> be using a function from pl/something? Figures on collision rates
> would
> be nice as well - the typical chunk of text is probably 1k-8k.
>
> Thanks!
>

As others have said MD5 isn't the fastest one out there. However no
cryptographically secure hashes are really that fast. However you
can probably get away with using a CRC hash which is long enough to
reduce your chances of collision a lot. However, PostgreSQL doesn't
have a built in CRC function, which is a bit of a pain unless your
prepared to implement one, or use pl/* to do it, which sounds like
overkill. I suggest you run some benchmarks on MD5 and see if it's
fast enough to meet your current (and perhaps future) needs.

You could of course, just use a hash index on your text field! I
think that would probably cope with larger data sets OK. It has the
advantage of handling collisions for you as well :) However it means
you have to transfer large amounts of data around, so if network
speed ever becomes a limitation, MD5 hashing (or enabling compression
on your PgSQL connection) may help.

> --
> Peter Fein pfein(at)pobox(dot)com
> 773-575-0694
>
> Basically, if you're not a utopianist, you're a schmuck. -J. Feldman
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
> http://archives.postgresql.org
>
>
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Együd Csaba 2005-06-14 11:35:45 Re: PG 8.0.1 is getting slow in 24 hours. Only daily VACUUM
Previous Message Ioannis Theoharis 2005-06-14 09:05:56 Re: suse 9.2