Re: PATCH: CITEXT 2.0 v3

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0 v3
Date: 2008-07-12 22:13:13
Message-ID: A2AA4B32-A916-497C-9C2B-DC69B8BF952A@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 12, 2008, at 14:57, Tom Lane wrote:

> There was some discussion earlier about whether the proposed
> regression
> tests for citext are suitable for use in contrib or not. After
> playing
> with them for awhile, I have to come down very firmly on the side of
> "not". I have these gripes:

You're nothing if not thorough, Tom.

> 1. The style is gratuitously different from every other regression
> test
> in the system. This is not a good thing. If it were an amazingly
> better way to do things, then maybe, but as far as I can tell the
> style
> pgTAP forces on you is really pretty darn poorly suited for SQL tests.
> You have to contort what could naturally be expressed in SQL as a
> table
> result into a scalar. Plus it's redundant with the expected-output
> file.

Sure. I wasn't aware of the existing regression test methodology when
I wrote pgTAP and those tests. I'm fine to change them and use pgTAP
for testing my app code, rather than PostgreSQL itself.

> 2. It's ridiculously slow; at least a factor of ten slower than doing
> equivalent tests directly in SQL. This is a very bad thing. Speed of
> regression tests matters a lot to those of us who run them a dozen
> times
> per day --- and I do not wish to discourage any developers who don't
> work that way from learning better habits ;-)

Hrm. I'm wonder why it's so slow? The test functions don't really do a
lot. Anyway, I agree that they should perform well.

> Because of #1 and #2 I find the use of pgTAP to be a nonstarter.
> I have a couple of gripes about the substance of the tests as well:
>
> 3. What you are mostly testing is not the behavior of citext itself,
> but the behavior of the underlying strcoll function. This is pretty
> much doomed to failure, first because the regression tests are
> intended
> to work in multiple locales (and we are *not* giving that up for
> citext), and second because the behavior of strcoll isn't all that
> portable across platforms even given allegedly similar locale settings
> (as we already found out yesterday).

Yes, I *just* ran the tests on Ubuntu and opened my mail to ask about
the bizarre differences when I saw this message. Thanks for answering
my question before I asked it. :-)

> Sadly, I think you have to give up
> attempts to check the interesting multibyte cases and confine yourself
> to tests using ASCII strings.

Grr. Kind of defeats the purpose. Is there no infrastructure for
testing multibyte functionality? Are test database clusters all built
using SQL_ASCII and the C locale?

> 4. A lot of the later test cases are equally uselessly testing whether
> piggybacking over text functions works. The odds of ever finding
> anything with those tests are not distinguishable from zero (unless
> the
> underlying text function is busted, which is not your responsibility
> to
> test). So I don't see any point in putting them into the standard
> regression package. (What maybe *would* be useful to test, but you
> didn't, is whether the result of a function is considered citext
> rather
> than text.)

Well, I was doing test-driven development: I wrote the tests to ensure
that I had added the functions for CITEXT properly, and when they
passed, I could move on. As a unit tester, it'd feel weird for me to
drop these. I'm not testing the underlying functions; I'm making sure
that they work properly with CITEXT.

> I suggest something more like the attached as a suitable regression
> test. I got bored about halfway through and started to skim, so there
> might be a few tests toward the end of the original set that would be
> worth transposing into this one, but nothing jumped out at me.

Thanks! I'll work this in.

Best,

David

PS: I confirmed late yesterday that the memory leak I saw was with my
version for 8.3, where I had my own str_tolower(). It clearly has a
leak that I'll have to fix, but it has no bearing on the contribution
to 8.4, which has no such leak. Thanks for running the test yourself
to confirm.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-07-13 03:30:33 Re: PATCH: CITEXT 2.0 v3
Previous Message Tom Lane 2008-07-12 21:57:27 Re: PATCH: CITEXT 2.0 v3