Re: ICU locale validation / canonicalization

From: Noah Misch <noah(at)leadboat(dot)com>
To: Jeff Davis <pgsql(at)j-davis(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: ICU locale validation / canonicalization
Date: 2023-07-01 17:31:32
Message-ID: 20230701173132.GC2764235@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, May 20, 2023 at 10:19:30AM -0700, Jeff Davis wrote:
> On Tue, 2023-05-02 at 07:29 -0700, Noah Misch wrote:
> > On Thu, Mar 30, 2023 at 08:59:41AM +0200, Peter Eisentraut wrote:
> > > On 30.03.23 04:33, Jeff Davis wrote:
> > > > Attached is a new version of the final patch, which performs
> > > > canonicalization. I'm not 100% sure that it's wanted, but it
> > > > still
> > > > seems like a good idea to get the locales into a standard format
> > > > in the
> > > > catalogs, and if a lot more people start using ICU in v16
> > > > (because it's
> > > > the default), then it would be a good time to do it. But perhaps
> > > > there
> > > > are risks?
> > >
> > > I say, let's do it.
> >
> > The following is not cause for postgresql.git changes at this time,
> > but I'm
> > sharing it in case it saves someone else the study effort.  Commit
> > ea1db8a
> > ("Canonicalize ICU locale names to language tags.") slowed buildfarm
> > member
> > hoverfly, but that disappears if I drop debug_parallel_query from its
> > config.
> > Typical end-to-end duration rose from 2h5m to 2h55m.  Most-affected
> > were
> > installcheck runs, which rose from 11m to 19m.  (The "check" stage
> > uses
> > NO_LOCALE=1, so it changed less.)  From profiles, my theory is that
> > each of
> > the many parallel workers burns notable CPU and I/O opening its ICU
> > collator
> > for the first time.
>
> I didn't repro the overall test timings (mine is ~1m40s compared to
> ~11-19m on hoverfly) but I think a microbenchmark on the ICU calls
> showed a possible cause.
>
> I ran open in a loop 10M times on the requested locale. The root locale
> ("und"[1], "root" and "") take about 1.3s to open 10M times; simple
> locales like 'en' and 'fr-CA' and 'de-DE' are all a little shower at
> 3.3s.
>
> Unrecognized locales like "xyz" take about 10 times as long: 13s to
> open 10M times, presumably to perform the fallback logic that
> ultimately opens the root locale. Not sure if 10X slower in the open
> path is enough to explain the overall test slowdown.
>
> My guess is that the ICU locale for these tests is not recognized, or
> is some other locale that opens slowly. Can you tell me the actual
> daticulocale?

As of commit b8c3f6d, InstallCheck-C got daticulocale=en-US-u-va-posix. Check
got daticulocale=NULL.

(The machine in question was unusable for PostgreSQL from 2023-05-12 to
2023-06-30, due to https://stackoverflow.com/q/76369660/16371536. That
delayed my response.)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Conway 2023-07-01 19:52:50 Re: RFC: pg_stat_logmsg
Previous Message Thom Brown 2023-07-01 16:39:07 Re: Does a cancelled REINDEX CONCURRENTLY need to be messy?