Re: Collation rules and multi-lingual databases

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Greg Stark <gsstark(at)mit(dot)edu>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Collation rules and multi-lingual databases
Date: 2003-08-23 09:51:16
Message-ID: 87isoohfcb.fsf@stark.dyndns.tv
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:

> Since most of that work is for an exceptional case, maybe it'd be safer
> (although slower) to structure the function as

Yeah I thought of that. But if making it a critical section is cheap then it's
probably a better approach. The problem with restoring the locale for the
palloc is that if the user is unlucky he might sort a table of thousands of
strings that all trigger the exception case.

The glibc docs sample code suggests using 2x the original string length for
the initial buffer. My testing showed that *always* triggered the exceptional
case. A bit of experimentation lead to the 3x+4 which eliminates it except for
0 and 1 byte strings. I'm still tweaking it. But on another OS, or in a more
complex collation locale maybe you would still trigger it a lot. Even as it is
if you happy to try to sort a large list of single character strings you would
trigger it a lot.

I have some documentation reading to do apparently before I can fix this up.

> setlocale
> call strxfrm (and that's it)
> setlocale back
> if there wasn't enough space
> make a new buffer
> setlocale
> call strxfrm (and that's it)
> setlocale back
>
> Probably putting the sl/strxfrm/sl into its own function.

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2003-08-23 10:33:35 Re: Single-file DBs WAS: Need concrete 'Why Postgres
Previous Message Stephan Szabo 2003-08-23 04:56:58 Re: Collation rules and multi-lingual databases