Re: patch: Allow the UUID type to accept non-standard formats

From: Mark Mielke <mark(at)mark(dot)mielke(dot)cc>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Allow the UUID type to accept non-standard formats
Date: 2008-10-10 14:43:42
Message-ID: 48EF6A1E.2020400@mark.mielke.cc
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
>> 1) Reduced error checking.
>> 2) The '-' is not the only character that people have used. ClearCase uses
>> '.' and ':' as punctuation.
>> 3) People already have the option of translating the UUID from their
>> application to a standard format.
>> 4) As you find below, and is probably possible to improve on, a fixed
>> format can be parsed more efficient.
>>
>
> Scenario 1. I have some standard format UUIDs and I want to parse
> them. This change doesn't bother me at all because if I'm parsing
> anywhere enough UUIDs for it to matter, the speed of my CPU, disk, and
> memory subsystems will vastly outweigh the difference between the two
> implementations. I measured the different between the two by running
> them both in a tight loop on a fixed string. I challenge anyone to
> produce a measurable performance distinction by issuing SQL queries.
> I doubt that it is possible.
>

Put a few changes of 2%-3% impact together and you get 10% or more. I'm
not saying you are wrong, but I disagree that performance should be
sacrificed for everybody without providing substantial benefit to
everybody. The question is then, does relaxed UUID parsing provide
substantial benefit to everybody?

> Scenario 2. I have some non-standard format UUIDs and I want to parse
> them. This change helps me a lot, because I'm almost positive that
> calling regexp_replace() and then uuid_in() is going to be MUCH slower
> than just calling uuid_in(). And if I do that then my error checking
> will be REALLY weak, unless I write a custom PL function to make sure
> that dashes only occur where they're supposed to be, in which case it
> will be even slower.
>

You should know the non-standard format of the UUID, and your
application should be doing the error checking. It might be slower for
*you*, but *you* are the one with the special needs. That is, unless you
are representing a significant portion of the population. What
percentage are you representing?

> Scenario 3. I only want standard-format UUIDs to be accepted into my
> database. Any non-standard format UUIDs should be rejected at parse
> time. This change is pretty irritating, because now I have to use
> regexp matching or something to make sure I've got the right format,
> and it's going to be significantly slower.
>
> My suspicion is that scenario 2 is a lot more common than scenario 3.
>

I prefer strict formats and early failures. I like that PostgreSQL
refuses to truncate on insertion. If I have a special format, I'm more
than willing to convert it from the special format to a standard format
before doing INSERT/UPDATE. What percentage of people out there feel
that they benefit from pedantic syntax checking? :-)

I don't know.

>> I don't know which implementation was used for the PostgreSQL core, but any
>> hard coded constants would allow for the optimizer to generate instructions
>> that can run in parallel, or that are better aligned to machine words.
>>
>> 2-3% slow down for what gain? It still doesn't handle all cases, and it's
>> less able to check the format for correctness.
>>
>
> This change is a long way from letting any old thing through as a
> UUID. I'm sure there are lots of crazy ways to write UUIDs, but
> everything I found with a quick Google search would be covered by this
> patch, so I think that's pretty good. A key point for me is that it's
> hard to imagine this patch accepting anything that was intended to be
> something other than a UUID. (I am sure someone will now write back
> and tell me about their favorite non-UUID thing that happens to have
> 32 hex digits with dashes for separators, but come on.)
>

It's not that long. If you get ColdFusion support(?), somebody else will
want the ':', and somebody else will want the '-'.

Anyways - I only somewhat disagree. I remember the original discussions,
and I remember agreeing with the points to keep PostgreSQL UUID support
thin and rigid. It's valuable for it to be built-in to the database.
It's not necessarily valuable for PostgreSQL to support every UUID
version or every format. Supporting additional formats is the direction
of supporting every UUID format. Three months from now, somebody is
going to propose allowing '-' or ':'. What should the answer be then?

Cheers,
mark

--
Mark Mielke <mark(at)mielke(dot)cc>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2008-10-10 14:49:59 Re: head's linking problem
Previous Message Heikki Linnakangas 2008-10-10 14:23:02 Re: TODO item: adding VERBOSE option to CLUSTER [with patch]