Re: PATCH: CITEXT 2.0

From: "David E(dot) Wheeler" <david(at)kineticode(dot)com>
To: Zdenek Kotala <Zdenek(dot)Kotala(at)Sun(dot)COM>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: CITEXT 2.0
Date: 2008-07-03 04:38:32
Message-ID: 55AA0795-AA11-4D44-B7F6-90B3A933365F@kineticode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Jul 2, 2008, at 09:13, Zdenek Kotala wrote:

> I went through your code and I have following comments/questions:

Thanks. I'm on a short family vacation, so it will probably be Monday
before I can submit a new patch. I got a few replies below, though.

> 1) formating
>
> Please, do not type space before asterix:
> char * lcstr, * rcstr -> char *lcstr, *rcstr
>
> and do not put extra space in a brackets
> citextcmp( left, right ) -> citextcmp(left, right)

Okay.

> Other seems to me OK (remove 8.2 version mention at the bottom)

Um, are you referring to this (at the top):

+// PostgreSQL 8.2 Magic.
+#ifdef PG_MODULE_MAGIC
+PG_MODULE_MAGIC;
+#endif

That's gotta be there (though I can of course remove the comment).

> 2) citextcmp function returns int but citext_cmp returns int32. I
> think citextcmp should returns int32 as well.

Yeah, I'm a bit confused by this. I followed what's in varlena.c on
this. The comparison functions are documented supposed to return 1, 0,
or -1, which of course would be covered by int. But varstr_cmp(),
which ultimately returns the value, returns all kinds of numbers. So,
perhaps someone could say what it's *supposed* to be?

> 3) citext_larger, citext_smaller function have memory leak. You
> don't call PG_FREE_IF_COPY on unused text.
>
> I'm not sure if return value in these function should be duplicated
> or not.

These I also duplicated from varlena.c, and I asked about a potential
memory leak in a previous email. If there's a leak here, would there
not also be a leak in varlena.c?

> 4) Operator = citext_eq is not correct. See comment http://doxygen.postgresql.org/varlena_8c.html#8621d064d14f259c594e4df3c1a64cac

So should citextcmp() call strncmp() instead of varst_cmp()? The
latter is what I saw in varlena.c.

> There must be difference between equality and collation for example
> in Czech language 'láska' and 'laská' are different word it means
> that 'láska' != 'laská'. But there is no difference in collation
> order. See Unicode Universal Collation Algorithm for detail.

I'll leave the collation stuff to the functions I call (*far* from my
specialty), but I'll add a test for this and make sure it works as
expected. Um, although, with what collation should it be tested? The
tests I wrote assume en_US.UTF-8.

> 5) There are several commented out lines in CREATE OPERATOR
> statement mostly related to NEGATOR. Is there some reason for that?

I copied it from the original citext.sql. Not sure what effect it has.

> Also OPERATOR || has probably wrong negator.

Right, good catch.

> 6) You use pgTAP for unit test. It is something what should be
> widely discussed. It is new approach and I'm not correct person to
> say if it good or not.

Sure. I created a pgFoundry project for it here:

http://pgtap.projects.postgresql.org/

> I see there two problems:
> At first point there is not clear licensing (https://svn.kineticode.com/pgtap/trunk/README.pgtap
> ).

That's a standard revised BSD license.

> And, It should be implemented as a general framework by my opinion
> not only as a part of citext contrib module.

It is. I just put the file in citext so that the tests can use it.
It's distributed on pgFoundry and maintained at the SVN URL quoted in
the file.

> Please, let me know when you will have updated code and I will start
> deep testing.

Will do.

Best,

David

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2008-07-03 04:39:13 Re: PATCH: CITEXT 2.0
Previous Message Ken Camann 2008-07-03 04:13:55 Re: A Windows x64 port of PostgreSQL