Re: ISN patch that applies cleanly with git apply

From: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-03 13:53:34
Message-ID: AANLkTimydWxkc-qdi8cRj11ArQx74EBVtb_1bmmXzb7e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Whoops...seems there was some minor mangling when I applied the
original patch by:

peter(at)linux-peter-home:~/postgresql> patch --version
GNU patch 2.6.1.81-5b68
****snip***
peter(at)linux-peter-home:~/postgresql> patch -c < contrib_isn-1.patch

I've attached a revised version, which I've carefully eye-balled and
corrected manually. The issue before was that an else if(...){...} was
repeated. Sorry about that.

On 2 October 2010 21:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Even more to the point, what about a link to the relevant changes in the
> standard?  It's impossible for anyone to tell whether these changes are
> sane in a vacuum, and a regression test will prove nothing at all except
> perhaps self-consistency.

Here's the problem....as far as I'm aware, there was no change to the
standard. The standard is extremely simple. What has changed is that
the relevant authorities (regional ISBN agencies) have continued to
allocate new publisher codes as new publishers have started trading,
and we have fallen out of lock step with them. That, and an
additional, reserved "bookland" country code has come into use for
ISBN-13 (namely, 979). This affected hyphenation of ISBNs, which is
what the patch author complained about, though it didn't cause valid
ISBNs to be rejected outright.

I have some misgivings about the design of contrib/isn.

The isbn domains are a bit like a regex domain to validate e-mail
addresses that has every single TLD in the world baked in - it fails
to take into account that TLDs come and go, just as contrib/isn fails
to take into account that new ISBN ranges are created over time. You
should either accept that you can't do that because it's beyond your
remit as a domain author, and be happy with just validating the
syntax, or validate against an external, maintained database of valid
TLDs (there's a CPAN module that does just that and more). We cannot
know all ISBN ranges in advance, because they haven't all been
allocated yet, and never will be. The patch even says "Range Table as
of 2010-Jul-29".

While we're on the topic of contrib/isn's shortcomings, I think that
there should really be an additional convenience domain that enforces
that the value is some type of barcode (be it a UPC, EAN-13 or
whatever), which could be implemented with a simple CASE statement in
the check constraint, and there should be an EAN-8 domain and a
GTIN-14 domain. At the moment, I use a domain (which is AS bigint)
with a check constraint that calls a function that's like this:

-- returns checkdigit validity for UPC, EAN-8, EAN-13 and GTIN-14
CREATE OR REPLACE FUNCTION is_gtin(bigint)
RETURNS BOOLEAN
LANGUAGE sql
STRICT IMMUTABLE
AS $$
SELECT ( sum(dgt) % 10 ) = 0
FROM
(
SELECT substring($1::text from idx for 1)::smallint AS dgt
FROM (SELECT generate_series(length($1::text), 1, -2) as idx) AS foo
UNION ALL
SELECT substring($1::text from idx for 1)::smallint * 3 AS dgt
FROM (SELECT generate_series(length($1::text) -1, 1, -2) as idx) AS foo
) AS bar

$$;

I'm not sure that the ISBN datatypes should be internally represented
as a 64-bit integer only - there should be some additional data that
specifies hyphenation. I'm not sure that we should be in the
hyphenation catch-up business. The fact is, however, that we are.
However, as far as I can tell at this stage, the patch doesn't make
the situation any worse, and is a temporary fix.

It's quite possible that I've overstated the problem, and that we
should continue to do our level best to hyphenate correctly for the
user, but it's important to understand that we cannot guarantee
correct hyphenation over time. Perhaps Jan Otto can weigh in here.

--
Regards,
Peter Geoghegan

Attachment Content-Type Size
fixed_git_isn.patch text/x-patch 17.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2010-10-03 14:06:25 Re: wip: functions median and percentile
Previous Message Hitoshi Harada 2010-10-03 13:47:28 Re: [RRR] top-level DML under CTEs