Re: Collation rules and multi-lingual databases

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Collation rules and multi-lingual databases
Date: 2003-08-24 20:32:15
Message-ID: 27478.1061757135@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Greg Stark <gsstark(at)mit(dot)edu> writes:
> The glibc docs sample code suggests using 2x the original string
> length for the initial buffer. My testing showed that *always*
> triggered the exceptional case. A bit of experimentation lead to the
> 3x+4 which eliminates it except for 0 and 1 byte strings. I'm still
> tweaking it. But on another OS, or in a more complex collation locale
> maybe you would still trigger it a lot.

On HPUX it seems you always need 4x. Also, *there are bugs* in some
platforms' implementations of strxfrm, such that an undersized buffer
may get overrun anyway. I had originally tried to optimize the buffer
size like this in src/backend/utils/adt/selfuncs.c's use of strxfrm,
and eventually was forced to give it up as hopeless. I strongly suggest
using the same code now seen there:

char *xfrmstr;
size_t xfrmlen;
size_t xfrmlen2;

/*
* Note: originally we guessed at a suitable output buffer size,
* and only needed to call strxfrm twice if our guess was too
* small. However, it seems that some versions of Solaris have
* buggy strxfrm that can write past the specified buffer length
* in that scenario. So, do it the dumb way for portability.
*
* Yet other systems (e.g., glibc) sometimes return a smaller value
* from the second call than the first; thus the Assert must be <=
* not == as you'd expect. Can't any of these people program
* their way out of a paper bag?
*/
xfrmlen = strxfrm(NULL, val, 0);
xfrmstr = (char *) palloc(xfrmlen + 1);
xfrmlen2 = strxfrm(xfrmstr, val, xfrmlen + 1);
Assert(xfrmlen2 <= xfrmlen);

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2003-08-24 20:46:40 Re: ambiguous sql states
Previous Message Tom Lane 2003-08-24 20:19:35 Re: Collation rules and multi-lingual databases