Question on Bizarre Sorting (ORDER BY in 7.1)

From: <pgsql-general(at)commandprompt(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Subject: Question on Bizarre Sorting (ORDER BY in 7.1)
Date: 2001-04-24 18:10:55
Message-ID: Pine.LNX.4.30.0104241043480.5175-100000@crazypenguins.commandprompt.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Good day,

We've run into a strange bit of sorting behavior with the new release of
PostgreSQL 7.1. Specifically, we have some text that we're using as
threadids in a discussion board, which look like the following example:

threadid
----------------------
000-0987877374-00313
___-0987877410-00316
___-0987877430-00317
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318

The signifigance of the numbers is secondary to the alphanumeric sorting
of them. You can see above that the first three characters are either
numeric or underscores. We were using the underscores as a means to force
"unrated" threads to be sorted after rated threads, and with PostgreSQL
7.0.3, and with some CVS snapshots for 7.1, it worked fine! If I performed
the query:

lxp=# SELECT threadid FROM test ORDER BY threadid;

I'd get:

threadid
----------------------
000-0987877374-00313
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318
___-0987877410-00316
___-0987877430-00317

However, at some point between the last snapshot we grabbed (several weeks
ago) and the release of 7.1, this behavior has changed. If I do the same
sort now, I get:

lxp=# SELECT threadid FROM test ORDER BY threadid;
threadid
----------------------
000-0987877374-00313
___-0987877410-00316
___-0987877430-00317
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318
(6 rows)

At first blush, it seems that it's somehow coming to the conclusion that
the underscore alphanumerically follows the 0, and preceds the 1. (?!)
However, that's not the end of it! Observe this unpredictable behavior
with ordering by substrings:

lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
substr
--------
___-0
___-0
000-0
100-0
100-0
200-0
(6 rows)

Now, the underscores appear to PRECEDE the 0's. This seems at least a
little more sane, however this is completely the opposite of where the
underscore would be sorted with 7.0.3. Now consider the next substring, of
six characters instead of five.

lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1, 6);
substr
---------
000-09
___-09
___-09
100-09
100-09
200-09
(6 rows)

Back to the underscores fitting between 0 and 1 again, simply by adding
a 9 to the end of the ids. Logically, I'm at a loss for why this should be.

I've already re-factored my system to use purely numeric values for
sorting, because it was impairing the capability of our message boards to
be properly sequenced, but I was interested in knowing whether or not this
is a bug, a change in the way PostgreSQL sorts, or possibly some kind of
locale-specific misconfiguration?

Any insight would be appreciated,
Jw @ Command Prompt.
--
By way of pgsql-general(at)commandprompt(dot)com(dot)

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Richard Huxton 2001-04-24 19:02:04 Re: Problem with postgreSQL (number of backends)
Previous Message John Oakes 2001-04-24 17:21:32 JDBC speed question.