From: | Samuel Gendler <sgendler(at)ideasculptor(dot)com> |
---|---|
To: | Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> |
Cc: | Eyal Wilde <eyal(at)impactsoft(dot)co(dot)il>, pgsql-sql(at)postgresql(dot)org |
Subject: | Re: a strange order by behavior |
Date: | 2011-06-22 09:39:34 |
Message-ID: | BANLkTi=7Lqyj4LbRPHp1mLo_0-mFN5XPmA@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-sql |
Interesting. The original thread to which I was referring has a subject of
"Sorting Issue" and the original request showed a list of vehicle model
names which were sorting as though there were no spaces. The user had
collation set to en_US.UTF-8. However, my database (on OS X) sorts both his
example and the example that started this thread correctly, despite my
lc_collate being set to the same value.
Then I just ran the exact same test on a Centos 5 linux host (postgresql 8.4
as well) and I am seeing the 'erroneous' sort order that prompted both this
thread and the other. So you can't even assume the same behaviour for the
same collation on different platforms.
On OS X:
# \l
Name | Owner | Encoding | Collation | Ctype |
Access privileges
-------------------+------------+----------+-------------+-------------+-----------------------
col_test | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
# show lc_collate;
lc_collate
-------------
en_US.UTF-8
# select * from t1 order by f1;
id | f1
----+-------------------
1 | CX Hatchback
2 | CX Minivan
2 | CX Plus Minivan
2 | CX Sedan
2 | CX Sport Utility
2 | CXL Minivan
2 | CXL Premium Sedan
2 | CXL Sedan
2 | CXL Sport Utility
2 | CXL Turbo Sedan
2 | CXS Sedan
On CentOS 5:
# \l
Name | Owner | Encoding | Collation | Ctype |
Access privileges
-------------------+------------+----------+-------------+-------------+-----------------------
col_test | u1 | UTF8 | en_US.UTF-8 | en_US.UTF-8 |
# show lc_collate
ecorithm-# ;
lc_collate
-------------
en_US.UTF-8
# select * from t1 order by f1;
id | f1
----+-------------------
1 | CX Hatchback
2 | CXL Minivan
2 | CXL Premium Sedan
2 | CXL Sedan
2 | CXL Sport Utility
2 | CXL Turbo Sedan
2 | CX Minivan
2 | CX Plus Minivan
2 | CX Sedan
2 | CX Sport Utility
2 | CXS Sedan
Further testing would seem to reveal that OS X is using something resembling
C collation order, despite the fact that it says en_US.UTF-8. I say this
because it is also case sensitive. CentOS, on the other hand, is ignoring
spaces, but is also case-insensitive.
# select * from t1 order by f1 asc;
id | f1
----+-------------------
1 | CX Hatchback
2 | CXL Minivan
2 | CXL Premium Sedan
2 | CXL Sedan
2 | CXL Sport Utility
2 | CXL Turbo Sedan
2 | CX Minivan
2 | CX Plus Minivan
2 | cx Sedan
2 | CX Sedan
2 | CX Sport Utility
2 | CXS Sedan
Pavel suggested using a collation of ucs_basic, but I get an error when I
try that on linux:
$ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
createdb: database creation failed: ERROR: invalid locale name ucs_basic
I was able to create the db with --lc_collate=C and get case-sensitive
sorting that treats spaces 'correctly,' but I have no idea how reliable that
is with multibyte characters and it almost certainly doesn't handle accented
characters correctly in languages that have a mix of ascii and non-ascii
characters, since the non-ascii chars will all sort as greater than the
ascii chars, which is probably not how things are alphabetized in those
languages.
I happen to have my computer set so I can type cyrillic and it appears to
work correctly with non-ascii chars when the entire charset is non-ascii,
but it'd be a lot of effort to construct a test that utilizes a mix of
1,2,3, and 4 byte characters, since I'm not familiar with enough alphabets.
test=# select * from t1 order by f1;
id | f1
----+-------------------
1 | CX Hatchback
2 | CX Minivan
2 | CX Plus Minivan
2 | CX Sedan
2 | CX Sport Utility
2 | CXL Minivan
2 | CXL Premium Sedan
2 | CXL Sedan
2 | CXL Sport Utility
2 | CXL Turbo Sedan
2 | CXS Sedan
2 | cx Sedan
2 | АИВЕТ
2 | П АИВЕТ
2 | П РИВЕТ
2 | П СИВЕТ
2 | ПРИВЕТ
2 | РИВЕТ
2 | СИВЕТ
2 | привет
From | Date | Subject | |
---|---|---|---|
Next Message | Pavel Stehule | 2011-06-22 10:01:46 | Re: a strange order by behavior |
Previous Message | Pavel Stehule | 2011-06-22 08:45:44 | Re: a strange order by behavior |