Re: ICU locale validation / canonicalization

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-05-20 17:19:30
Message-ID: 423180d32bd2e9a61b839aff0dfefa1655d2fc1f.camel@j-davis.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > On 30.03.23 04:33, Jeff Davis wrote:
> > > Attached is a new version of the final patch, which performs
> > > canonicalization. I'm not 100% sure that it's wanted, but it
> > > still
> > > seems like a good idea to get the locales into a standard format
> > > in the
> > > catalogs, and if a lot more people start using ICU in v16
> > > (because it's
> > > the default), then it would be a good time to do it. But perhaps
> > > there
> > > are risks?
> >
> > I say, let's do it.
>
> The following is not cause for postgresql.git changes at this time,
> but I'm
> sharing it in case it saves someone else the study effort.  Commit
> ea1db8a
> ("Canonicalize ICU locale names to language tags.") slowed buildfarm
> member
> hoverfly, but that disappears if I drop debug_parallel_query from its
> config.
> Typical end-to-end duration rose from 2h5m to 2h55m.  Most-affected
> were
> installcheck runs, which rose from 11m to 19m.  (The "check" stage
> uses
> NO_LOCALE=1, so it changed less.)  From profiles, my theory is that
> each of
> the many parallel workers burns notable CPU and I/O opening its ICU
> collator
> for the first time.

I didn't repro the overall test timings (mine is ~1m40s compared to
~11-19m on hoverfly) but I think a microbenchmark on the ICU calls
showed a possible cause.

I ran open in a loop 10M times on the requested locale. The root locale
("und"[1], "root" and "") take about 1.3s to open 10M times; simple
locales like 'en' and 'fr-CA' and 'de-DE' are all a little shower at
3.3s.

Unrecognized locales like "xyz" take about 10 times as long: 13s to
open 10M times, presumably to perform the fallback logic that
ultimately opens the root locale. Not sure if 10X slower in the open
path is enough to explain the overall test slowdown.

My guess is that the ICU locale for these tests is not recognized, or
is some other locale that opens slowly. Can you tell me the actual
daticulocale?

Regards,
Jeff Davis

[1] It appears that "und" is also slow to open in ICU < 64. Hoverfly is
on v58, so it's possible that's the problem if daticulocale=und.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2023-05-20 17:26:29 Re: Adding SHOW CREATE TABLE
Previous Message Alexander Lakhin 2023-05-20 16:00:01 Re: pgbench: using prepared BEGIN statement in a pipeline could cause an error