Re: uuid type for postgres

From: Bob Ippolito <bob(at)redivi(dot)com>
To: jonah(dot)harris(at)gmail(dot)com
Cc: "mark(at)mark(dot)mielke(dot)cc" <mark(at)mark(dot)mielke(dot)cc>, Greg Sabino Mullane <greg(at)turnstep(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: uuid type for postgres
Date: 2005-09-08 17:24:47
Message-ID: 4FFD90A6-C245-478C-A213-55D7C35BB6D1@redivi.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

One reason to use a UUID type over a naively stored hash for this
purpose is that it takes up half the space as naively stored MD5 and
40% of the space as naively stored SHA1. Granted, it's easy enough
to pack them, but packed MD5 does have the same storage requirements
as UUID and it won't be quite as convenient to use as a native UUID
type out of the box.

-bob

On Sep 8, 2005, at 9:02 AM, Jonah H. Harris wrote:

> Mark,
>
> I think what Greg suggested was sha1(number) as the key instead of
> requiring uuid as the key... it would perform the same function as
> far as you r use case is concerned.
>
> As a similar example (using MD5):
>
> CREATE SEQUENCE marks_seq START 1 INCREMENT 1;
> CREATE TABLE your_tbl (
> your_key VARCHAR(32) NOT NULL DEFAULT md5(nextval
> ('marks_seq')),
> your_picture BYTEA NOT NULL,
> PRIMARY KEY (your_key));
>
> INSERT INTO your_tbl (your_picture) VALUES ('some bytea data');
>
> select * from your_tbl ;
> your_key | your_picture
> ----------------------------------+-----------------
> c4ca4238a0b923820dcc509a6f75849b | some bytea data
>
> same functionality from your standpoint.
>
>
>
> On 9/8/05, mark(at)mark(dot)mielke(dot)cc <mark(at)mark(dot)mielke(dot)cc> wrote:
> On Thu, Sep 08, 2005 at 01:45:10PM -0000, Greg Sabino Mullane wrote:
> > > For a rather simple example, consider a site that associates a
> picture
> > > with each member. If the pictures are named 1.jpg, 2.jpg,
> 3.jpg, etc.
> > > it makes it ridiculously easy to write a script to pull all of the
> > > pictures off the site. This can be bothersome, as the only type of
> > > person who would do this, is the type of person with an
> illegitimate
> > > motivation. I want the data to be easily and freely accessible as
> > > specific objects, but I do not wish to provide an easy way of
> > > dumping all of the data as a unit.
> > I don't think security through obscurity is a good reason to add
> something
> > to PostgreSQL. Either make the pictures accessible to everyone,
> or put
> > some other security mechanism in place. As far as pictures, you
> could
> > certainly used the SHA1 checksum as your primary key, which would
> satisfy
> > all your requirements.
>
> I think you missed two parts. The first is, that it isn't really an
> attempt at security. The pictures are still freely available. There
> is no need to block accesses.
>
> The goal is to prevent a dump of the database. Nobody should be
> able to
> trivially download all the pictures at one time.
>
> I'm not sure what you are suggesting the SHA1 checksum be applied to.
> Not the picture, for sure, as this would require that my tables store
> a primary key, and a picture key, leading me right back to something
> that seems a little ridiculous.
>
> If you mean the SHA1 checksum of some constant secret password, and
> the primary key, possibly SERIAL, then sure - but that's ignoring the
> other benefits of UUID. If I'm going to use UUID, I might as well use
> only UUID, and use the SHA1 checksum of the constant secret password,
> and the UUID. So, the idea has merit - and was suggested by another
> person - but it doesn't mean using the UUID alone, without all the
> extra layers on top I would have to write, is the wrong choice. Why
> would I go out of my way to jump through so many hoops, when I can
> very easily, use the UUID?
>
> You're asking me to do a lot, only so that you can point out that
> 'UUID' is not necessary. Heck, I could use the timestamp, and a
> random key as two separate fields. There are always work arounds.
>
> Is it good to use work arounds? Or a type that already works fine
> for the purposes? :-)
>
> > > In my case, it isn't only pictures. I don't want people pulling
> all
> > > the data off the site as a dump, and using it how they wish,
> but I do
> > > wish to make the data freely available, and easily accessible
> from a
> > > web browser.
> > .htaccess? Cookies? Encrypted data? iptables? All of these are
> better
> > solutions than random filenames.
>
> Nope. The files are publically accessible. No reason the secure their
> access, and securing their access makes the whole system less
> efficient.
>
> > > Not that everybody should rip out SERIAL and replace it with UUID,
> > > but it really isn't that bad, and in some cases, such as mine,
> > > I don't see the point of using both, and choose to instead allow
> > > UUID to solve many of my concerns at the same time, with an
> > > acceptable for me cost in database pages.
> > We're doing something similar to UUIDs here - LSID (Life Science
> > Identifiers). But being as the whole point of LSIDs is to share
> > data, the "random" factor is not needed. If ever PG did do something
> > like UUID (and despite the paragraphs above, I could see a use
> for it),
> > I'd rather do something like picking from a predetermined range
> of values
> > at random and deplete the pool as you go rather than create large
> > values with a low (but finite) probability of collision.
>
> I don't need the random factor. I only need it to be difficult to
> guess.
>
> The timestamps have sufficient granularity, that they are effectively
> difficult to guess, and a foreach loop would be expensive. If none of
> the bits were random, I would still be happy.
>
> If I had LSID support, and no UUID support, perhaps I'd use LSID
> instead. I have no special attachment to UUID, other than it satisfies
> my concerns and purposes in a convenient manner. I can merge my tables
> without fiddling with the sequence definitions. I can hand out
> external identifiers for specific objects which are effectively
> unguessable. I can avoid having two primary keys, and use only the
> one. Any type that satisfied these concerns would make me happy.
>
> Cheers,
> mark
>
> --
> mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com
> __________________________
> . . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood
> Coder
> |\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
> | | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa,
> Ontario, Canada
>
> One ring to rule them all, one ring to find them, one ring to
> bring them all
> and in the darkness bind them...
>
> http://mark.mielke.cc/
>
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 6: explain analyze is your friend
>
>
>
> --
> Respectfully,
>
> Jonah H. Harris, Database Internals Architect
> EnterpriseDB Corporation
> http://www.enterprisedb.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2005-09-08 17:31:36 Re: initdb profiles
Previous Message Peter Eisentraut 2005-09-08 17:20:59 Re: initdb profiles

Browse pgsql-sql by date

  From Date Subject
Next Message Michael Fuhr 2005-09-08 20:08:02 Re: Statistics from Sequences
Previous Message Jonah H. Harris 2005-09-08 17:03:11 Re: uuid type for postgres