Skip site navigation (1) Skip section navigation (2)

Re: How to add locale support for each column?

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com>
Cc: Greg Stark <gsstark(at)mit(dot)edu>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>,pgsql-hackers(at)postgresql(dot)org
Subject: Re: How to add locale support for each column?
Date: 2004-09-26 02:42:09
Message-ID: 873c15agzi.fsf@stark.xeocode.com (view raw or flat)
Thread:
Lists: pgsql-hackerspgsql-patches
Stephan Szabo <sszabo(at)megazone(dot)bigpanda(dot)com> writes:

> But shouldn't the comparison be against sorting on col not lower(col)?
> strxfrm(col) sorts seem comparable to col, strxfrm(lower(col)) sorts seem
> comparable to lower(col). Some collations do treat 'A' and 'a' as be
> adjacent in sort order, but that's not a guarantee, so it's not valid to
> say, "everywhere you'd use lower(col) you can use strxfrm instead."

Well, in my implementation strxfrm is a postgresql function. So I wanted to
compare it with an expression that had at least as much overhead as a
postgresql expression with a single function call.

> And in past numbers you sent, it looked like the amounts were: 1s for sort
> on col, 1.5s for sort on lower(col), 2.5s for sort on strxfrm(col).  That
> doesn't seem negligible to me

Right, I amended my "negligible" claim. It's a significant but reasonable
speed. A 1.5s delay on sorting 100k rows is certainly not the kind of
intolerable delay that would make the idea of switching locales intolerable.

> unless that doesn't grow linearly with the number of rows.

Well I was comparing sorting 206,000 rows. Even if it scales linearly, a 10s
delay on sorting 2M records isn't really fatal. I certainly wouldn't want to
remove the ability to sort using strcmp if the data is ascii or binary. But if
you're going to use locale collation order it's going to be slower. strxfrm
has to do quite a bit of work. Even a postgres-internal mechanism is going to
have to do that same work.

The only time you could save is the time it takes to look up "en_US" in a list
(or hash) of cached locales and switch a pointer. I suspect that's going to be
on a small (but not negligible) portion the overhead. I guess this is subject
to analysis, I'll try to do a gprof run at some point to answer that.


> > I see no reason to think Postgres's implementation of looking up xfrm rules
> > for the specified locale will be any faster than the OS's. We know some OS's
> > suck but some certainly don't.
> 
> But do you have to change locales per row or per sort? Presumably, a built
> in implementation may be able to do the latter rather than the former.

We certainly need the ability to change the locales per-row, in fact possibly
multiple times per row.

Consider

select en,fr
  from translations
 order by en,fr

Which is actually something reasonable I could have to do in my current
project.

However changing locales should be nigh-instantaneous, it really ought to be
just changing a pointer. And in the API Tom foresees shouldn't even happen.
The only cost of sorting on many locales (aside from the initial load) would
be in the reduced cache hit rate from using more locale tables.

-- 
greg


In response to

Responses

pgsql-hackers by date

Next:From: Ross J. ReedstromDate: 2004-09-26 03:03:34
Subject: Re: 'TID index'
Previous:From: Bruce MomjianDate: 2004-09-26 02:25:47
Subject: Re: Make configure use krb5-config

pgsql-patches by date

Next:From: Stephan SzaboDate: 2004-09-26 03:11:41
Subject: Re: How to add locale support for each column?
Previous:From: Bruce MomjianDate: 2004-09-26 02:14:50
Subject: Re: libpq verinfo patch

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group