Re: patch: Allow the UUID type to accept non-standard formats

From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Robert Haas" <robertmhaas(at)gmail(dot)com>, "Mark Mielke" <mark(at)mark(dot)mielke(dot)cc>
Cc: "Pg Hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: Allow the UUID type to accept non-standard formats
Date: 2008-10-10 10:20:21
Message-ID: 758d5e7f0810100320y7a1f160bwba65df921e51a282@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 10, 2008 at 7:28 AM, Mark Mielke <mark(at)mark(dot)mielke(dot)cc> wrote:
> Robert Haas wrote:
>> While we could perhaps accept only those variant formats which we
>> specifically know someone to be using, it seems likely that people
>> will keep moving those pesky dashes around, and we'll likely end up
>> continuing to add more formats and arguing about which ones are widely
>> enough used to deserve being on the list. So my vote is - as long as
>> they don't put a dash in the middle of a group of four (aka a byte),
>> just let it go.
> I somewhat disagree with supporting other formats. Reasons include:
>
> 1) Reduced error checking.

Hmm, I tend to disagree. If UUIDs were variable length (different number
of digits), then perhaps yes. But as all UUIDs have same number of
digits, the dashes inbetween them act as decorators.

> 2) The '-' is not the only character that people have used. ClearCase uses
> '.' and ':' as punctuation.

I would be more in favor of accepting MAC-address style notation AA:BB:CC:DD
also, in that case, but I think its going too far... So, I am for sticking with
dashes and groups of four :)

> 3) People already have the option of translating the UUID from their
> application to a standard format.

Regexp, the swiss-army knife of data manipulation. ;)

While possible, it really is not that easy and efficient. At least we should
accept dashless UUIDs, so instead of tediously reformatting UUID once
could do s/-//g

> 4) As you find below, and is probably possible to improve on, a fixed
> format can be parsed more efficient.

What I was thinking about is using the same lookup-table style approach
as encode()/decode() pair uses. Should be faster than current implementation,
and skipping over '-' (and even ':' or '.') is even simpler. I don't
know internals
good enough to know how that would work in encodings like UTF16...

See http://doxygen.postgresql.org/encode_8c-source.html#l00107

Best regards,
Dawid Kuroczko
--
.................. ``The essence of real creativity is a certain
: *Dawid Kuroczko* : playfulness, a flitting from idea to idea
: qnex42(at)gmail(dot)com : without getting bogged down by fixated demands.''
`..................' Sherkaner Underhill, A Deepness in the Sky, V. Vinge

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2008-10-10 10:47:59 LWLockAcquire with priority
Previous Message Simon Riggs 2008-10-10 09:51:53 latestCompletedXid