Re: Theory of operation of collation patch

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Theory of operation of collation patch
Date: 2011-03-08 08:16:06
Message-ID: 20110308081606.GA6725@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Mar 07, 2011 at 07:15:28PM -0500, Tom Lane wrote:
> Only if the expression-to-be-sorted does not already fully specify the
> collation, which so far as I can tell (either from the code or your
> description above) it does. I think that the explicit representation
> of collation as part of the PathKey node is unnecessary, inefficient,
> and bug-inducing --- the latter because it promotes fuzzy thinking about
> where the collation information is coming from. (And this isn't just
> hypothetical: IMO the bugs I exhibited upthread are *directly* due to
> fuzzy thinking about what defines an index's sort order.)

Well, collation processing happens in two phases. Initially collate
information is provided by the columns in the query, explicit clauses,
etc. These are indeed attached to the values. From here the collations
of expressions are determined. The SQL committee thought up a bunch of
actually quite logical rules here. Explicit overrides implicit,
implicit overrides default. Both explicit or both implicit is an error,
etc. Note error state is only a problem if you use it for sorting or
comparison, otherwise it is ignored.

This phase of processing happens in the parse analysis, the end result
being that every expression should have a collation set, and every
operator where it matters has consistant collation information for its
arguments. So at this point the collation should be attached to the
expression.

In the planning phase however, all collation information is ignored
except where it matters, which is the comparison operators and ORDER BY
and similar. By the previous phase all comparison operators should have
a defined collation for its arguments and thus for itself, this allows
it to construct appropriate pathkeys for index scans, etc.

I havn't looked at the patch, perhaps it confuses the information in
the two phases but the basic idea seems to me:

- parse analysis - collation in expressions
- planning - collation in path key

> Or, to put it another way: the properties that define a sort order are
> the sort comparison operator, the collation, the ASC/DESC bit, and the
> NULLS FIRST/LAST bit. Given the way that the SQL committee has
> constructed the language, the operator and the two flag bits are
> attached to the ORDER BY clause, but the collation is a property of the
> expression-to-be-sorted. If we fail to preserve that distinction in the
> internal representation, we're just creating problems for ourselves.

The assigning of the collation to expressions is simply the method for
getting the collation information to the operators. Mainly because 99%
of the time there will be no explicit clauses in the queries, so the
information has to get there by other means.

Hope this helps,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
> - Charles de Gaulle

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2011-03-08 08:37:24 Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum
Previous Message Heikki Linnakangas 2011-03-08 08:00:01 Re: Re: PD_ALL_VISIBLE flag was incorrectly set happend during repeatable vacuum