Re: How to add locale support for each column?

From: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
To: Greg Stark <gsstark(at)mit(dot)edu>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to add locale support for each column?
Date: 2004-09-25 15:47:28
Message-ID: 20040925083506.I92255@megazone.bigpanda.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


On Sun, 19 Sep 2004, Greg Stark wrote:

> Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> writes:
>
> > Greg Stark <gsstark(at)mit(dot)edu> writes:
> > > Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > >> 2) switching the locale at run time is too expensive when using the system
> > >> library.
> >
> > > Fwiw I did some experiments with this and found it wasn't true.
> >
> > Really?
>
> We're following two different methodologies so the results aren't comparable.
> I exposed strxfrm to postgres and then did a sort on strxfrm(col). The
> resulting query times were slower than sorting on lower(col) by negligible
> amounts.

But shouldn't the comparison be against sorting on col not lower(col)?
strxfrm(col) sorts seem comparable to col, strxfrm(lower(col)) sorts seem
comparable to lower(col). Some collations do treat 'A' and 'a' as be
adjacent in sort order, but that's not a guarantee, so it's not valid to
say, "everywhere you'd use lower(col) you can use strxfrm instead."

And in past numbers you sent, it looked like the amounts were: 1s for sort
on col, 1.5s for sort on lower(col), 2.5s for sort on strxfrm(col). That
doesn't seem negligible to me unless that doesn't grow linearly with the
number of rows. It also seems like if the only differences in the query
was that, then the time for the strxfrm was significant compared to the
rest of the query time on that query.

> > These are on machines of widely varying horsepower, so the absolute
> > numbers shouldn't be compared across rows, but the general story holds:
> > setlocale should be considered to be at least an order of magnitude
> > slower than strcoll, and on non-glibc machines it can be a whole lot
> > worse than that.
>
> I don't see how this is relevant though. One way or another postgres is going
> to have to sort strings in varying locales chosen at run-time. Comparing
> against strcoll's execution time without changing changing locales is a straw
> man. It's like comparing your tcp/ip bandwidth with the loopback interface's
> bandwidth.
>
> I see no reason to think Postgres's implementation of looking up xfrm rules
> for the specified locale will be any faster than the OS's. We know some OS's
> suck but some certainly don't.

But do you have to change locales per row or per sort? Presumably, a built
in implementation may be able to do the latter rather than the former.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Marc G. Fournier 2004-09-25 16:41:31 Re: anoncvs lock problem
Previous Message Daniel Ahlin 2004-09-25 13:40:12 Allow change of kerberos service name without recompilation

Browse pgsql-patches by date

  From Date Subject
Next Message Andrew Dunstan 2004-09-25 17:04:39 cvsup
Previous Message Joe Conway 2004-09-25 15:03:49 Re: plpython win32