Re: Rework of collation code, extensibility

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rework of collation code, extensibility
Date: 2023-02-13 23:45:41
Message-ID: 6ec4ad7f93f255dbb885da0a664d9c77ed4872c4.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

New version attached. Changes:

* I moved the GUC patch to the end (so you can ignore it if it's not
useful for review)
* I cut out the pg_locale_internal.h rearrangement (at least for now,
it might seem useful after the dust settles on the other changes).
* I added a separate patch for pg_locale_deterministic().
* I added a separate patch for a simple cleanup of a USE_ICU special
case.

Now the patches are:

0001: pg_strcoll/pg_strxfrm
0002: pg_locale_deterministic()
0003: cleanup a USE_ICU special case
0004: GUCs (only for testing, not for commit)

Responses to your review comments inline below:

On Mon, 2023-02-13 at 11:35 +0100, Peter Eisentraut wrote:
> The commit message for 0002 says "Also remove the TRUST_STRXFRM
> define",
> but I think this is incorrect, as that is done in the 0001 patch.

Fixed.

> I don't like that the pg_strnxfrm() function requires these kinds of
> repetitive error checks:
>
> +           if (rsize != bsize)
> +               elog(ERROR, "pg_strnxfrm() returned unexpected
> result");
>
> This could be checked inside the function itself, so that the callers
> don't have to do this themselves every time.

The current API allows for a pattern like:

/* avoids extra work if existing buffer is big enough */
len = pg_strxfrm(buf, src, bufSize, loc);
if (len >= bufSize)
{
buf = repalloc(len+1);
bufSize = len+1;
len2 = pg_strxfrm(buf, src, bufSize, loc);
}

The test for rsize != bsize are just there to check that the underlying
library calls (strxfrm or getSortKey) behave as documented, and we
expect that they'd never be hit. It's hard to move that kind of check
into pg_strxfrm() without making it also manage the buffers.

Do you have a more specific suggestion? I'd like to keep the API
flexible enough that the caller can manage the buffers, like with
abbreviated keys. Perhaps the check can just be removed if we trust
that the library functions at least get the size calculation right? Or
turned into an Assert?

--
Jeff Davis
PostgreSQL Contributor Team - AWS

Attachment Content-Type Size
v10-0001-Add-pg_strcoll-pg_strxfrm-and-variants.patch text/x-patch 41.7 KB
v10-0002-Introduce-pg_locale_deterministic.patch text/x-patch 7.9 KB
v10-0003-Remove-unnecessary-ifdef-USE_ICU.patch text/x-patch 5.3 KB
v10-0004-Introduce-GUCs-to-control-abbreviated-keys-sort-.patch text/x-patch 8.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-02-14 00:03:49 Re: pglz compression performance, take two
Previous Message Andres Freund 2023-02-13 23:44:23 Re: Adding "large" to PG_TEST_EXTRA