Re: PATCH: CITEXT 2.0 v3

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0 v3
Date: 2008-07-14 19:04:55
Message-ID: 7D3F0762-F324-4F9D-9ECF-02AACC894AD7@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 14, 2008, at 11:49, Tom Lane wrote:

>> You don't seem to understand what I'm suggesting: I'm not talking
>> about testing strcoll. I'm talking about making sure that citext
>> *uses* strcoll.
>
> This seems pointless, as well as essentially impossible to enforce by
> black-box testing. Nobody is likely to consider a patch that removes
> such obviously basic functionality of the module.

Never underestimate the human capacity for incompetence. One of my
mottos.

> I think you're
> worrying about testing the wrong things --- and as evidence I'd note
> the
> serious errors that slipped by your testing (including the fact that
> the
> last submitted version would still claim that 'a' = 'ab').

Um, say what? I had that problem at one time, but it was when I was
writing my own lower() function, which was shitty (I wasn't creating a
nul-terminated string). I don't recall that being in any patch I sent
to the list, and indeed you wrote that very test in the revised test
script you sent in.

At any rate, I make no claims that my tests properly covered
everything. There is a lot I didn't know to test, but I'm learning
more all the time. That's why this code review has been so immensely
valuable, both for me and for CITEXT.

> What we
> generally ask a regression test to do is catch portability problems
> (which is certainly a real-enough issue here, but we don't know how
> to address it well) or catch foreseeable breakage (such as someone
> introducing a cast that breaks resolution of citext function calls).
> The methodology can't catch everything, and trying to push it beyond
> what it can do is just a time sink for effort that can more usefully
> be spent other ways, such as code-reading.

I guess what I'm thinking of is not regression tests but unit tests.
And with unit testing, one of the goals is coverage. Good coverage has
saved my ass many times in the past, even catching changes that, in
hindsight, should obviously never have been attempted. Perhaps it'd be
worth thinking about creating a project for unit testing PostgreSQL in
controlled environments? Maybe having tests that can contain proper
conditionals?

Anyway, it seems that, given the limitations of the current testing
system with regard to testing multibyte characters, the issue is moot.
I'll throw in a few tests that test multibyte characters, but I'll
comment them out and just uncomment them for individual test runs on
my box for the purposes of my own sanity going forward.

Thanks,

David

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Gregory Stark 2008-07-14 19:09:55 Re: Fwd: Proposal - UUID data type
Previous Message Tom Lane 2008-07-14 18:49:06 Re: PATCH: CITEXT 2.0 v3