Re: ICU for global collation

From: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Subject: Re: ICU for global collation
Date: 2022-01-11 09:10:25
Message-ID: c3093547-6ce2-66ca-71de-3a3c633cfb02@enterprisedb.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 10.01.22 07:00, Julien Rouhaud wrote:
>>> And then I changed in varstr_cmp():
>>>
>>> if (collid != DEFAULT_COLLATION_OID)
>>> mylocale = pg_newlocale_from_collation(collid);
>>>
>>> to just
>>>
>>> mylocale = pg_newlocale_from_collation(collid);
>>>
>>> I find that the \timing results are indistinguishable. (I used locale
>>> "en_US.UTF-8" and made sure that that code path is actually hit.)
>>>
>>> Does anyone have other insights?
>>
>> Looking at the git history, you added this comment in 414c5a2ea65.
>>
>> After a bit a digging in the lists, I found that you introduced it to fix a
>> reported 13% slowdown in varstr_cmp():
>> https://www.postgresql.org/message-id/20110129075253.GA18784%40tornado.leadboat.com
>> https://www.postgresql.org/message-id/1296748408.6442.1.camel%40vanquo.pezone.net
>
> So I tried to run Noah's benchmark to see if I could reproduce the slowdown.
> Unfortunately the results I'm getting don't really make sense as removing the
> optimisation brings a 15% speedup, and with a few more runs I can see that I
> have about 25% noise, so there isn't much I can do to help.

Heh, I had that same experience, it actually got faster without the
optimization, but then got lost in the noise on further testing.

Looking back at those discussions, I don't think those old test results
are relevant anymore. In the patch that was being tested there,
pg_newlocale_from_collation(), did not contain

if (collid == DEFAULT_COLLATION_OID)
return (pg_locale_t) 0;

so the default collation actually went through most or all of the
function and did a lot of work. That would understandably be quite
slow. But just calling a function and returning immediately should not
be a problem. Otherwise, the call to check_collation_set() in
varstr_cmp() and elsewhere would be just as bad.

So, unless there are concerns, I'm going to see about making a patch to
call pg_newlocale_from_collation() even with the default collation.
That would make the actual feature patch quite a bit smaller, since we
won't have to patch every call site of pg_newlocale_from_collation().

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jakub Wartak 2022-01-11 09:33:55 Re: In-placre persistance change of a relation
Previous Message Konstantin Knizhnik 2022-01-11 08:41:26 Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes