Re: uuid type for postgres

From: mark(at)mark(dot)mielke(dot)cc
To: "Jonah H(dot) Harris" <jonah(dot)harris(at)gmail(dot)com>
Cc: josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, nathan wagner <nw(at)hydaspes(dot)if(dot)org>
Subject: Re: uuid type for postgres
Date: 2005-09-06 20:38:11
Message-ID: 20050906203811.GA3069@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-sql

On Tue, Sep 06, 2005 at 03:57:55PM -0400, Jonah H. Harris wrote:
> I agree with Josh on the UUID type, it gets abused far too often and (IMHO)
> isn't widely enough used to belong in the core.

There is much in PostgreSQL from my perspective that falls under the
category of 'most advanced open source SQL server in the world', and
not at all in the category of 'widely used'. Unless the history of
PostgreSQL isn't to be accepted in terms of direction for PostgreSQL,
I don't see why such a useful building block shouldn't be supported
by the core. It may not be widely used, because it doesn't exist.
Not yet, anyways. pguuid is broken, remember?

Which isn't to say that it should be in the core, or it must be in the
core. It is to say, however, that I don't buy your arguments. Abused how?
How can you judge widely used for something that doesn't exist?

I'm not very newbie oriented. I think I tend to assume that anybody
using PostgreSQL must know what they are doing. This may be very
wrong of me to assume. I assume you all know what you are doing,
for example.

I'm curious as to what sort of abuses you and Josh are speaking about.
I'd particularly like to know if you think I am abusing it, as I don't
believe that I am. I would also be interested in knowing what you felt
would be an abuse with UUID, that wouldn't be an abuse with SERIAL.
Should we tell newbies not to use SERIAL? Perhaps we should. Do we?

> Couldn't you just fix the
> problem in pguuid rather than write a whole new type?

I'll submit my patches to pguuid once I've had some more production
use of it. For the little bit of code that it was, there were some
pretty bad bugs. It was sometimes crashing, due to what appears to be
a palloc() that was one byte short (it reliably crashed on me, the
line is wrong as written, and after adding + 1, it no longer crashes
at all). The operator definitions weren't correct, and the indices on
columns of the type were not being preferred by the query planner (a
good thing too - if they were used for any operator other than '=',
the errors in the operator definitions would have caused very
unexpected results).

So yes, pguuid can be fixed. I'm not sure that pguuid makes a good
model for a PostgreSQL extension, but it could be resurrected and
used. (I think the project showed no real updates since 2003?)

But - it's GPL, limiting its use with regard to PostgreSQL
distribution, and it requires special compilation, which as Nathan
found out, doesn't work on MacOS.

I think the GPL reason alone is a compelling reason to create a new
extension. Using a more portable UUID base library (either written
from scratch, or re-used from some other place providing a
compatible license) is a compelling reason to create a new
extension.

As to whether it should be in core - I think that a fully functional
module could be widely used, especially in larger systems that are
having difficulty spreading transactions across multiple
machines. This is the domain that UUID shines in. I can write a
transaction at my site without being very worried that it will collide
with a transaction at your site. We don't have to be connected in real
time.

In my choice of use, I'm using them instead of SERIAL columns, as I
wish to have more freedom merging production data with test data. I
wish to continually import production data into my test environment,
in a single direction. UUID will prevent conflicts from occurring.
SERIAL cannot (although in theory, I could set my copy of the serial
value to 1 billion or something hacky - but that doesn't scale in a
simple fashion to having several test environments). My other
preferred use, is to expose a handle on the data to the world in
select circumstances. I don't want to give them a SERIAL column, as
it lets them be able to predict what else they might have access
to. I'm not using it to secure the data (other routines will do this),
but I am using it to hide the data. Why should people accessing my
system know how many records exist in my tables? Why should they be
able to predict the next value? Why should they be able to relate
the data, or mine my data in the case that I allow read to all?

Yes, I can avoid exposing the UUID/SERIAL in most circumstances. They
can get at most of the data through a name based path. For a few of my
pieces of data, though, I want to provide a secondary means of
accessing the data that does not require a path. Names cause problems,
especially if the names contain UNICODE characters, or if the names
are very long. Giving them a reliably unique handle of fixed length
is highly desirable to me in these circumstances.

Anyways, I'm using a fixed up pguuid right now, and getting along fine.
Nathan, under MacOS, isn't - with or without my patch. And I believe
he stated he had issues with the GPL license.

Josh may be correct, that in terms of a position statement, my
arguments are all over the map, and ignorant of newbies. I'm not a
speaker, or a writer. I'm a technical person who was frustrated with
SERIAL, happy with UUID, frustrated to find pguuid broken, happy that
it was fixable, frustrated with having to compile and install it
separately, happy that there is interest from others (Nathan) in
attempting to have this building block worked on, and perhaps
eventually added to the core.

Personally, I'm not sure what the big opposition to UUID is all about.
Even abused, it can do a better job than SERIAL at quadruple the
storage requirements (not a big deal when considering PostgreSQL's
record overhead). If it'll make it more acceptable, I'll write the
UUID implementation myself, and release it under whatever license
you find most beneficial to PostgreSQL. :-)

Cheers,
mark

> On 9/6/05, Josh Berkus <josh(at)agliodbs(dot)com> wrote:
> >
> > Mark,
> >
> > > I suggest that UUID be recommended in place of SERIAL for certain
> > > classes of applications, and that it therefore belongs in the core.
> > > UUID and SERIAL can be used together (although, once you have a UUID,
> > > it may not be useful to also have a SERIAL).
> >
> > I think that, if you want to push a refactored UUID type for PostgreSQL
> > 8.2, that you'd better separate your database design arguments from your
> > inclusion arguments.
> >
> > For example, you might get my agreement that it would be useful to have a
> > UUID as a core type; you would *never* get my agreement to recommend using
> > UUID to newbies. I have seen *far* too many abuses of UUIDs in really
> > bad database design. People who use them should be experienced enough to
> > know what they're doing.
> >
> > --
> > --Josh
> >
> > Josh Berkus
> > Aglio Database Solutions
> > San Francisco
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 9: In versions below 8.0, the planner will ignore your desire to
> > choose an index scan if your joining column's datatypes do not
> > match
> >

--
mark(at)mielke(dot)cc / markm(at)ncf(dot)ca / markm(at)nortel(dot)com __________________________
. . _ ._ . . .__ . . ._. .__ . . . .__ | Neighbourhood Coder
|\/| |_| |_| |/ |_ |\/| | |_ | |/ |_ |
| | | | | \ | \ |__ . | | .|. |__ |__ | \ |__ | Ottawa, Ontario, Canada

One ring to rule them all, one ring to find them, one ring to bring them all
and in the darkness bind them...

http://mark.mielke.cc/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2005-09-06 20:49:28 Re: uuid type for postgres
Previous Message Tom Lane 2005-09-06 20:35:45 Re: purge hash table, how to?

Browse pgsql-sql by date

  From Date Subject
Next Message David Fetter 2005-09-06 20:49:28 Re: uuid type for postgres
Previous Message Jonah H. Harris 2005-09-06 20:32:06 Re: uuid type for postgres