Re: Refactor to introduce pg_strcoll().

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Refactor to introduce pg_strcoll().
Date: 2022-11-04 22:06:41
Message-ID: 4124e778e2ebc93d8e64b836b0037f5426aa08d6.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2022-11-01 at 13:36 +0100, Peter Eisentraut wrote:
> But I think putting the Windows UTF-8 code (win32_utf8_wcscoll())
> from
> varstr_cmp() into pg_strcoll() might create future complications.
> Normally, it would be up to varstr_sortsupport() to set up a
> Windows/UTF-8 specific comparator function, but it says there it's
> not
> implemented.  But someone who wanted to implement that would have to
> lift it back out of pg_strcoll, or rearrange the code in some other
> way.

Is there a reason that it wouldn't work to just use
varlenafastcmp_locale() after my patch? The comment says:

/*
* There is a further exception on Windows. When the database
* encoding is UTF-8 and we are not using the C collation, complex
* hacks are required...

But I think the complex hacks are just the transformation into UTF 16
and calling of wcscoll(). If that's done from within pg_strcoll()
(after my patch), then I think it just works, right?

I can't easily test on windows, so perhaps I'm missing something. Does
the buildfarm have enough coverage here? Is it reasonable to try a
commit that removes the:

#ifdef WIN32
if (GetDatabaseEncoding() == PG_UTF8 &&
!(locale && locale->provider == COLLPROVIDER_ICU))
return;
#endif

along with my patch and see what I get out of the buildfarm?

> Perhaps you have some follow-up work
> planned, in which case it might be better to consider this in more
> context. 

I was also considering an optimization to use stack allocation for
short strings when doing the non-UTF8 icu comparison. I am seeing some
benefit there, but it seems to depend on the size of the stack buffer.
That suggests that maybe TEXTBUFSIZE is too large (1024) and perhaps we
should drop it down, but I need to investigate more.

In general, I'm trying to slowly get the locale stuff more isolated.

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Juan José Santamaría Flecha 2022-11-04 22:08:24 Re: WIN32 pg_import_system_collations
Previous Message Andres Freund 2022-11-04 21:21:26 Re: remap the .text segment into huge pages at run time