Re: sortsupport for text

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Robert Haas <robertmhaas(at)gmail(dot)com>, Greg Stark <stark(at)mit(dot)edu>, PG Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: sortsupport for text
Date: 2012-06-19 16:17:29
Message-ID: CAEYLb_XhhPopKcW5MYtLpoAS1zXry0GRqf0KqYP7tRZAq6bd5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 19 June 2012 16:17, Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something which has been bugging me for a while: in some other
> systems I've used, the "tie-breaker" comparison for equivalent
> values comes after equivalence sorting on *all* sort keys, rather
> than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

> test=# select * from c order by 2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  SMITH     | EDWARD
>  smith     | peter
> (3 rows)
>
> This seems completely wrong:
>
> test=# select * from c order by 1,2;
>  last_name | first_name
> -----------+------------
>  smith     | bob
>  smith     | peter
>  SMITH     | EDWARD
> (3 rows)

Agreed. Definitely a POLA violation.

> I'm sure the latter is harder to do and slower to execute; but the
> former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kohei KaiGai 2012-06-19 16:26:29 Re: WIP Patch: Selective binary conversion of CSV file foreign tables
Previous Message Merlin Moncure 2012-06-19 16:15:46 Re: pgsql_fdw in contrib