Re: How to generate unique hash-type id?

From: Craig Ringer <craig(at)postnewspapers(dot)com(dot)au>
To: Joe Kramer <cckramer(at)gmail(dot)com>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: How to generate unique hash-type id?
Date: 2010-01-29 13:24:11
Message-ID: 4B62E17B.50804@postnewspapers.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On 29/01/2010 4:20 PM, Joe Kramer wrote:
> Hello,
>
> I need to generate unique id which is not guessable unlike
> serial(integer) type. I need an id in format like md5 hash of random
> number.
> On top of that I need this id to be unique across multiple tables.
>
> Anyone had to solve this problem before? Can you post any recipes or
> best practices please?
>
> My questions:
>
> 1. Avoiding collisions.
> If I make an UNIQUE constraint and do generation of id triggered on
> INSERT. What if collision happens? DO I nee d to check if unique hash
> already exists and if not- regenerate.
> This looks too primitive. Is there a readily available function or
> methodology to do that?
>
> 2. Generating global unique id across multiple tables.
> How to do that? My only idea is to have separate table to keep all
> hashes and compare for collision against that table.
> Is there a better way? Maybe by creating some special serial type that
> is not integer but varchar?
>
> 3. what function to use to generate 64-bit random hash without much
> overhead to CPU?

When I ran into something somewhat akin to this I asked the list about a
non-repeating pseudo-random mapping function. Daniel Verite (on this
list) enlightened me about Feistel networks/cyphers, and even posted a
PL/PgSQL implementation!

It's documented here:

http://wiki.postgresql.org/wiki/Pseudo_encrypt

and has been extremely handy. It should fit you needs - just define a
sequence that you pull new input values from, and use that same sequence
across all tables that need unique values.

It's not trivial to assert, across multiple tables, that a value is
unique. You could do it with a trigger that checks each table that uses
such values (slow-ish but effective) or maintains a side table of values
in use. In either case it'd have to be added to every table that used
the pseudo_encrypt values.

By the way, if you intend to expose these to users you might also want
some kind of data entry error checking so that a typo can't accidentally
transform id `n' to id `m' with transposition of a single digit or the
like. The luhn algorithm provides a good way to do that with a simple
check digit - it's not cryptographically strong in that if you know the
values are checked with the luhn algorithm it's trivial to re-generate
the check digit, but it helps a log against casual scanning of the
number space and against accidental user data entry error. Check out:

http://wiki.postgresql.org/wiki/Luhn_algorithm

--
Craig Ringer

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Enrico Pirozzi 2010-01-29 13:29:48 ODBC mac os
Previous Message Ivan Sergio Borgonovo 2010-01-29 12:52:19 Re: How to generate unique hash-type id?