Re: Weird sorting order

From: "Kevin Grittner" <Kevin(dot)Grittner(at)wicourts(dot)gov>
To: <pgsql-admin(at)postgresql(dot)org>, "Robert Voinea" <robert(dot)voinea(at)topex(dot)ro>
Subject: Re: Weird sorting order
Date: 2010-07-19 14:11:39
Message-ID: 4C4416CB020000250003389C@gw.wicourts.gov
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Robert Voinea <robert(dot)voinea(at)topex(dot)ro> wrote:

> => show lc_collate;
> lc_collate
> -------------
> en_US.UTF-8

I'm afraid the order you're seeing is what you're supposed to get
for that collation sequence. In that collation, special characters
(including spaces) are only used as tie-breakers for values which
are tied when the special characters are ignored. There may be a
few consequences of that which you haven't yet found. As one
example:

test=# show lc_collate;
lc_collate
-------------
en_US.UTF-8
(1 row)

test=# create table t1 (c1 text);
CREATE TABLE
test=# insert into t1 values ('one'),(' one'),('one
'),('##one'),('one##');
INSERT 0 5

test=# select '"' || c1 || '"' from t1 order by c1;
?column?
----------
"one"
" one"
"##one"
"one "
"one##"
(5 rows)

test=# select '"' || c1 || '"' from t1 order by c1 desc;
?column?
----------
"one##"
"one "
"##one"
" one"
"one"
(5 rows)

For that reason, we have (so far) used the C locale, which provides
the binary sort you probably expected, and we use special columns,
maintained by triggers, to control selection and sequencing as
needed -- for example we have a "searchName" column in any table
where we have name columns, which is forced into a canonical format.

FWIW, our algorithm for generating a canonical name also excludes
spaces and the '#' character, although it still has significant
differences from the en_US.UTF-8 collation.

-Kevin

In response to

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Burgholzer, Robert (DEQ) 2010-07-19 14:28:47 Re: Weird sorting order
Previous Message Robert Voinea 2010-07-19 05:23:13 Re: Weird sorting order