Re: Using GUIDs

From: "Aaron Bono" <postgresql(at)aranya(dot)com>
To: "Dennis Wagner" <wagner(at)spheron(dot)com>
Cc: pgsql-admin(at)postgresql(dot)org
Subject: Re: Using GUIDs
Date: 2006-06-27 18:02:22
Message-ID: bf05e51c0606271102m6e4d1163wa8fb93e559b6fc11@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

> > By its definition, GUID ( http://en.wikipedia.org/wiki/GUID ) is not
> > guaranteed to be unique. As a result, it could cause problems in
> > tables that have a large number of rows - actually this could result
> > in hard to reproduce, seemingly random errors in your systems that use
> > the database. How does a GUID help you over a BIGSERIAL?
> It would help me in restoring archived projects. There is a quite
> frequent scenario where a once archived (e.g. in a file) subset of the
> db-contents will be restored. In that occasion I cannot be sure wether
> the bigserial-values the entries has before are free. One can easily
> imagine a system holding one project v(aka. subset of the content). Then
> one archives that projects, deletes the db, resets it's schema and then
> enteres two new projects.

Why do you need to delete the db and reset it? Why not just truncate the
tables and leave the database and sequence numbers as they are? And even if
you do delete and reset the db, all you need to do is reseed the sequences
to start where they left off so the ID remains unique even across archives.

Afetr that, the once archived project is
> restored from the archive. That would most likely fail as the sequences
> will have given the same IDs (starting with 1) to elements of the other
> two projects. I would need a handmade merge which would really be
> painfull.
> Using GUIDs that use the MAC-adresses as well as the creation-time (as
> in version 4 of uuids e.g.) would most likely (I see, there IS the
> probability but I doubt it happing inside a common LAN, at least in the
> life-time of the product) create distinct GUIDS that would reside
> side-by-side without any trouble.

You could also make your MAC address/timestamp separate columns which
combined are the primary key. Then you don't need any GUID. Add an insert
trigger to the tables to detect possible duplicates and tweak the timestamp
a little or throw in a third smallint column into the primary key that the
trigger can use to avoid the conflicts. I do something like this with my
log files - I name them with a timestamp, check to see if that name exists
and then add a -1, -2, -3, etc. to the end of the file name if the file
already exists.

>
> > Unless you are generating the PK in multiple databases, a BIGSERIAL
> > should act as a UUID.
> >
> > Maybe if you provide a little more detail on why you want to use GUID
> > or UUID fields we can help you come up with more options.
> >
> We're talking about a Client-Server-Application that would normally be
> run on about 10 to 50 clients simultanously using the same backend.
> Project-archivment will be frequent as single projects are open for
> approximatly one to three month.
> I'm well aware that as GUIDS are not really guaranteed to be unique
> there could be scenarios where double ID's would occur. But I could
> easily sense that before entering the project and then do this
> praticular one by hand. I don't want to do this on every project that is
> restored.

You could also use a separate schema for each project. Then restoring a
project would not step on or conflict with other projects in any way. And
you can do unions across schemas to do detailed reporting on all projects.
Then in the public or some master schema, keep a list of the project schemas
so you could even create dynamic views (via a function or stored procedure)
that union all current projects/schemas together when needed. Just an
idea...

As a second subject I will need to extract subsets of the content for
> external use in form of lists. Using the GUIDs of the entries would make
> them be more distinct and thus no addidtional numbering would be
> necessary for those lists. Currently I'm storing entries of different
> type in different tables. To have them distinct on a combined list I
> would need both the type and the id of a single row which would make up
> 16 Bytes if the type is bigserial as well (which it is in my scenario).

So there is a problem of space? How many records are you talking about that
makes adding a type such a concern?

-Aaron

In response to

Browse pgsql-admin by date

  From Date Subject
Next Message Aaron Bono 2006-06-27 18:18:18 Re: Hiding table structure information
Previous Message Chris Hoover 2006-06-27 16:49:25 Re: 8.1.3 - autovacuum question