Re: Create on insert a unique random number

From: "D'Arcy J(dot)M(dot) Cain" <darcy(at)druid(dot)net>
To: Steve Midgley <public(at)misuse(dot)org>
Cc: lance(at)uiuc(dot)edu, pgsql-sql(at)postgresql(dot)org
Subject: Re: Create on insert a unique random number
Date: 2008-03-19 16:37:42
Message-ID: 20080319123742.9e1e62b5.darcy@druid.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

On Wed, 19 Mar 2008 08:28:28 -0700
Steve Midgley <public(at)misuse(dot)org> wrote:
> I'm not following this line. Maybe we're talking about two different
> things here.. I don't know if Lance is using "CRUD" methodology per se,
> but that's a well accepted web approach and uses (generally) serial
> primary keys in the URL structure as (where numbers are serial pk's):
>
> [website]/contact/12345
> [website]/property/45678
> [and the client sends GET, POST, PUT, DELETE http requests, or
> mimics, to activate various functions]

Yes, I do this all the time.

> Whether CRUD of otherwise, in the model I was promoting, there would be
> two index columns in the table along with other data, a public index
> and a serial primary key. The public index is based on the primary key:
>
> pk | public_pk
> 1 | md5(1 + fixed salt)
> 2 | md5(2 + fixed salt)
> ...
>
> AFAIK, an MD5 hash is guaranteed to generate a unique output for any
> unique input, so the serial key and fixed salt would guarantee no hash
> index collisions on the MD5 output. Of course if a competitor knows
> you're using MD5 and they know your salt, they could calculate all the
> md5 integer hashes and see which ones exist..
>
> But I could care less if he uses md5 or sha-1 or Guids! (I just picked
> MD5 because another poster recommended it and it's very easy to
> implement in Pg). The point I care about is that there would be a
> public_pk that associates to one-and-only-one serial pk. Also that
> public_pk should be 1) not easily guessable, 2) non-clustering (and
> therefore non-serial). Then his url's would look like something like:
>
> [website]/contact/c4ca4238a0b923820dcc509a6f75849b
> [website]/property/c81e728d9d4c2f636f067f89cc14862c

Right and, as you state above, they could be guessable if someone gets
their hands on a relatively small amount of information. If you simply
generate a random string of n characters where n is based on the amount
of security you need, you can store that and store it in a separate
field in the record. You don't even need to make them unique. Just
incorporate the serial number as well as the random string. There may
conceivably be two records with "1ed6f54e5636837ddae4ef33397ee2cb" as
the key but only one that looks like
"021857.1ed6f54e5636837ddae4ef33397ee2cb". In fact, you could md5 the
serial key and just string the two together if you really wanted
security through more obscurity but that's probably overkill.

The point here is that no one can guess what someone's URL is, even if
they know the ID, administrators can call up records by ID and
individual secret keys can be changed if compromised without affecting
anyone else. Also, it's a normalized table. Storing a value that you
can generate is unnormalized.

> The original post did not want users to be able to type in random
> integers like:
>
> /contact/343

See above. That's not what I was suggesting.

> And find out if that record #343 exists or not (regardless of whether
> they can get access to the record - the error generated on
> no-authorization may be different from record-not-found). So starting
> at a million does not fix the OP's issue.

Certainly you would generate the same error to the web user, even if
you differentiate in your internal error log.

> From my perspective, wherever you start your serial index, competitors
> can watch it grow over time, if it's a numeric serial. That could be
> more valuable in many businesses than knowing the initial size of the
> table.

I guess it depends on the business case. Certainly we can always find
a use case where a specific solution fails but that's not how we really
work. We get all the details of the requirements and then code what
solves them. See above for the "overkill" method that solves that
issue if it really is one.

> Anyway, I hope that clears up what I was recommending! I didn't
> anticipate it would stir up this much analysis and I hope the OP finds
> your input and mine useful in coming up with a final answer to his
> issue. Thanks for taking the time to consider the issue and I'll look
> forward to any additional ideas or comments you have on this too!

Yes, discussion is always useful, and fun. :-)

--
D'Arcy J.M. Cain <darcy(at)druid(dot)net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Tena Sakai 2008-03-19 17:56:27 compare 2 tables in sql
Previous Message Steve Midgley 2008-03-19 15:28:28 Re: Create on insert a unique random number