Re: How can sort performance be so different

From: Bob Jolliffe <bobjolliffe(at)gmail(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: pgsql-performance(at)lists(dot)postgresql(dot)org
Subject: Re: How can sort performance be so different
Date: 2019-01-31 13:29:59
Message-ID: CACd=f9cO5OcmAeBK1M5a3+t8BO7E91Ki0WaLAxFm=FBbon_HDw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi Peter

I did check out using ICU and the performance does indeed seem
comparable with C locale:

EXPLAIN ANALYZE select * from chart order by name COLLATE "lo-x-icu";
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Sort (cost=1470.65..1504.24 rows=13436 width=1203) (actual
time=82.752..85.723 rows=13436 loops=1)
Sort Key: name COLLATE "lo-x-icu"
Sort Method: quicksort Memory: 6253kB
-> Seq Scan on chart (cost=0.00..549.36 rows=13436 width=1203)
(actual time=0.043..12.634 rows=13436 loops=1)
Planning time: 1.610 ms
Execution time: 96.060 ms
(6 rows)

The Laos folk have confirmed that the sort order with C locale was not
correct. So setting the ICU locale seems to be the way forward.

The problem is that this is a large java application with a great
number of tables and queries. Also it is used in 60+ countries not
just Laos. So we cannot simply modify the queries or table creation
scripts directly such as in the manner above. I was hoping the
solution would just be to set a default locale on the database
(perhaps even und-x-icu) but I see now that this doesn't seem to be
currently possible with postgresql 10 ie. set the locale on database
creation to a *-icu locale.

Is this also a limitation on postgresql 11? (Upgrading would be possible)

Any other workarounds worth trying? The magnitude of this issue is
significant - 1000x slower on these basic sorts is crippling the
application, probably also in a number of other queries.

Regards
Bob

On Wed, 30 Jan 2019 at 23:54, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
>
> On Wed, Jan 30, 2019 at 3:57 AM Bob Jolliffe <bobjolliffe(at)gmail(dot)com> wrote:
> > (i) whether the sort order makes sense for the Laos names; and
> > (ii) what the locale settings are on the production server where the
> > problem was first reported.
> >
> > There will be some turnaround before I get this information. I am
> > guessing that the database is using "en_US" rather than anything Laos
> > specific. In which case "C" would probably be no worse re sort order.
> > But will know better soon.
> >
> > This has been a long but very fruitful investigation. Thank you all for input.
>
> If you can find a way to use an ICU collation, it may be possible to
> get Laotian sort order with performance that's a lot closer to the
> performance you see with the C locale. The difference that you're
> seeing is obviously explainable in large part by the C locale using
> the abbreviated keys technique. The system glibc's collations cannot
> use this optimization.
>
> I believe that some locales have inherently more expensive
> normalization processes (comparisons) than others, but if you can
> effective amortize the cost per key by building an abbreviated key, it
> may not matter that much. And, ICU may be faster than glibc anyway.
>
> --
> Peter Geoghegan

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Arne Roland 2019-01-31 18:19:54 RE: dsa_allocate() faliure
Previous Message Laurenz Albe 2019-01-31 11:57:39 Re: Setting effective_cache size