Re: Rework of collation code, extensibility

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Rework of collation code, extensibility
Date: 2023-01-17 22:18:20
Message-ID: CAH2-WzmDEYEobEjWYpnHPx3F92pxaG1W4TVFdhCMjjHK+voq9g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Jan 14, 2023 at 12:03 PM Jeff Davis <pgsql(at)j-davis(dot)com> wrote:
> The first goal I had was simply that the code was really hard to
> understand and work on, and refactoring was justified to improve the
> situation.
>
> The second goal, which is somewhat dependent on the first goal, is that
> we really need an ability to support multiple ICU libraries, and I
> wanted to do some common groundwork that would be needed for any
> approach we choose there, and provide some hooks to get us there. You
> are right that this goal influenced the first goal.

I don't disagree that it was somewhat independent of the first goal. I
just think that it makes sense to "round up to fully dependent".
Basically it's not independent enough to be worth talking about as an
independent thing, just as a practical matter - it's confusing at the
level of things like the commit message. There is a clear direction
that you're going in here from the start, and your intentions in 0001
do matter to somebody that's just looking at 0001 in isolation. That
is my opinion, at least.

The second goal is a perfectly good enough goal on its own, and one
that I am totally supportive of. Making the code clearer is icing on
the cake.

> ucol_strcollUTF8() accepts -1 to mean "nul-terminated". I did some
> basic testing and it doesn't seem like it's slower than using the
> length. If passing the length is faster for some reason, it would
> complicate the API because we'd need an entry point that's expecting
> nul-termination and lengths, which is awkward (as Peter E. pointed
> out).

That's good. I'm happy to leave it at that. I was only enquiring.

> I felt it was a little clearer amongst the other code, to a casual
> reader, but I suppose it's a style thing. I will change it if you
> insist.

I certainly won't insist.

> I'd have to expose the pg_locale_t struct, which didn't seem desirable
> to me. Do you think it's enough of a performance concern to be worth
> some ugliness there?

I don't know. Quite possibly not. It would be nice to have some data
on that, though.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-01-17 22:37:37 PGDOCS - sgml linkend using single-quotes
Previous Message Tom Lane 2023-01-17 22:13:58 Re: pgsql: Doc: add XML ID attributes to <sectN> and <varlistentry> tags.