Re: Theory of operation of collation patch

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Theory of operation of collation patch
Date: 2011-03-08 00:15:28
Message-ID: 18217.1299543328@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> On Mon, Mar 07, 2011 at 11:43:20AM -0500, Tom Lane wrote:
>> Is there any documentation of $SUBJECT?

> The collation is a property of the operators/functions and not of the
> values. An individual value does not have a collation, a column does.

OK.

> A pathkey represents a sort order, right? To define a sort order you
> need a collation and so the path key is the natural place to put it.

Only if the expression-to-be-sorted does not already fully specify the
collation, which so far as I can tell (either from the code or your
description above) it does. I think that the explicit representation
of collation as part of the PathKey node is unnecessary, inefficient,
and bug-inducing --- the latter because it promotes fuzzy thinking about
where the collation information is coming from. (And this isn't just
hypothetical: IMO the bugs I exhibited upthread are *directly* due to
fuzzy thinking about what defines an index's sort order.)

Or, to put it another way: the properties that define a sort order are
the sort comparison operator, the collation, the ASC/DESC bit, and the
NULLS FIRST/LAST bit. Given the way that the SQL committee has
constructed the language, the operator and the two flag bits are
attached to the ORDER BY clause, but the collation is a property of the
expression-to-be-sorted. If we fail to preserve that distinction in the
internal representation, we're just creating problems for ourselves.

I'm willing to take a pass at fixing this code during the alpha cycle,
but I want to be sure I understand it correctly first. So, if there's
a hole in my thinking, please point it out.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jaime Casanova 2011-03-08 00:20:27 Re: pgsql: Basic Recovery Control functions for use in Hot Standby. Pause,
Previous Message daveg 2011-03-07 23:53:40 Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum