Re: ISN patch that applies cleanly with git apply

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Peter Geoghegan <peter(dot)geoghegan86(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>, Jan Otto <asche(at)me(dot)com>
Subject: Re: ISN patch that applies cleanly with git apply
Date: 2010-10-18 13:35:17
Message-ID: AANLkTikZjnDMWwJBJpUyEp2ucptU+T7opVHA=AGLc185@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Oct 3, 2010 at 9:53 AM, Peter Geoghegan
<peter(dot)geoghegan86(at)gmail(dot)com> wrote:
> Whoops...seems there was some minor mangling when I applied the
> original patch by:
>
> peter(at)linux-peter-home:~/postgresql> patch --version
> GNU patch 2.6.1.81-5b68
> ****snip***
> peter(at)linux-peter-home:~/postgresql> patch -c < contrib_isn-1.patch
>
>
> I've attached a revised version, which I've carefully eye-balled and
> corrected manually. The issue before was that an else if(...){...} was
> repeated. Sorry about that.
>
>
> On 2 October 2010 21:10, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Even more to the point, what about a link to the relevant changes in the
>> standard?  It's impossible for anyone to tell whether these changes are
>> sane in a vacuum, and a regression test will prove nothing at all except
>> perhaps self-consistency.
>
> Here's the problem....as far as I'm aware, there was no change to the
> standard. The standard is extremely simple. What has changed is that
> the relevant authorities (regional ISBN agencies) have continued to
> allocate new publisher codes as new publishers have started trading,
> and we have fallen out of lock step with them. That, and an
> additional, reserved "bookland" country code has come into use for
> ISBN-13 (namely, 979). This affected hyphenation of ISBNs, which is
> what the patch author complained about, though it didn't cause valid
> ISBNs to be rejected outright.
>
> I have some misgivings about the design of contrib/isn.
>
> The isbn domains are a bit like a regex domain to validate e-mail
> addresses that has every single TLD in the world baked in - it fails
> to take into account that TLDs come and go, just as contrib/isn fails
> to take into account that new ISBN ranges are created over time. You
> should either accept that you can't do that because it's beyond your
> remit as a domain author, and be happy with just validating the
> syntax, or validate against an external, maintained database of valid
> TLDs (there's a CPAN module that does just that and more). We cannot
> know all ISBN ranges in advance, because they haven't all been
> allocated yet, and never will be. The patch even says "Range Table as
> of 2010-Jul-29".
>
> While we're on the topic of contrib/isn's shortcomings, I think that
> there should really be an additional convenience domain that enforces
> that the value is some type of barcode (be it a UPC, EAN-13 or
> whatever), which could be implemented with a simple CASE statement in
> the check constraint, and there should be an EAN-8 domain and a
> GTIN-14 domain. At the moment, I use a domain (which is AS bigint)
> with a check constraint that calls a function that's like this:
>
> -- returns checkdigit validity for UPC, EAN-8, EAN-13 and GTIN-14
> CREATE OR REPLACE FUNCTION is_gtin(bigint)
> RETURNS BOOLEAN
> LANGUAGE sql
> STRICT IMMUTABLE
> AS $$
> SELECT ( sum(dgt) % 10 ) = 0
>        FROM
>        (
>                        SELECT substring($1::text from idx for 1)::smallint AS dgt
>                        FROM (SELECT generate_series(length($1::text), 1, -2) as idx) AS foo
>                UNION ALL
>                        SELECT substring($1::text from idx for 1)::smallint * 3 AS dgt
>                        FROM (SELECT generate_series(length($1::text) -1, 1, -2) as idx) AS foo
>        ) AS bar
>
> $$;
>
> I'm not sure that the ISBN datatypes should be internally represented
> as a 64-bit integer only - there should be some additional data that
> specifies hyphenation. I'm not sure that we should be in the
> hyphenation catch-up business. The fact is, however, that we are.
> However, as far as I can tell at this stage, the patch doesn't make
> the situation any worse, and is a temporary fix.
>
> It's quite possible that I've overstated the problem, and that we
> should continue to do our level best to hyphenate correctly for the
> user, but it's important to understand that we cannot guarantee
> correct hyphenation over time. Perhaps Jan Otto can weigh in here.

I may be in the minority here, but I'm inclined to just apply this and
move on. I agree with your concerns that this whole module is badly
designed and that the approach of hard-coding the list of ISBN ranges
is fundamentally unscalable, but unless we're going to rip it out or
rearchitect it, we don't really get any benefit out of digging in our
heels and refusing to apply occasional band-aids. What should
probably be done in the long-term is either:

1. Stop storing the list of ISBN ranges in an array and put it in a
table. That way, users without C programming skills can still update
the list of ranges, and users with C programming skills can do it
without recompiling.

or possibly

2. Don't try to validate the list of ISBN ranges at all.

Between now and when someone does one of those things, however, this
has some value.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-10-18 13:54:41 Re: ask for review of MERGE
Previous Message David Fetter 2010-10-18 13:29:55 Re: knngist - 0.8