Re: improve Chinese locale performance

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Greg Stark <stark(at)mit(dot)edu>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Craig Ringer <craig(at)2ndquadrant(dot)com>, Quan Zongliang <quanzongliang(at)gmail(dot)com>, Pgsql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: improve Chinese locale performance
Date: 2013-07-22 17:39:50
Message-ID: 51ED6E66.4040803@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 07/22/2013 12:49 PM, Greg Stark wrote:
> On Mon, Jul 22, 2013 at 12:50 PM, Peter Eisentraut <peter_e(at)gmx(dot)net> wrote:
>> I think part of the problem is that we call strcoll for each comparison,
>> instead of doing strxfrm once for each datum and then just strcmp for
>> each comparison. That is effectively equivalent to what the proposal
>> implements.
> Fwiw I used to be a big proponent of using strxfrm. But upon further
> analysis I realized it was a real difficult tradeoff. strxrfm saves
> potentially a lot of cpu cost but at the expense of expanding the size
> of the sort key. If the sort spills to disk or even if it's just
> memory bandwidth limited it might actually be slower than doing the
> additional cpu work of calling strcoll.
>
> It's hard to see how to decide in advance which way will be faster. I
> suspect strxfrm is still the better bet, especially for complex large
> character set based locales like Chinese. strcoll might still win by a
> large margin on simple mostly-ascii character sets.
>
>

Perhaps we need a bit of performance testing to prove the point.

Maybe the behaviour should be locale-dependent.

cheers

andrew

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Greg Smith 2013-07-22 17:49:39 Re: [PATCH] pgbench --throttle (submission 7 - with lag measurement)
Previous Message Andrew Gierth 2013-07-22 17:14:02 Re: Review: UNNEST (and other functions) WITH ORDINALITY