Re: PGDay.it collation discussion notes

From: Martijn van Oosterhout <kleptog(at)svana(dot)org>
To: Gregory Stark <stark(at)enterprisedb(dot)com>
Cc: Postgres <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: PGDay.it collation discussion notes
Date: 2008-10-18 10:51:20
Message-ID: 20081018105120.GA1320@svana.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Oct 18, 2008 at 07:41:12AM +0100, Gregory Stark wrote:
> The ANSI standard syntax where the COLLATION keyword can follow just about any
> string value in an expression and then bubbles up the expression until an
> operation needs to pick a collation seemed very weird to us. Hooking that into
> the parser was pretty intimidating but we thought we could postpone that till
> later and solve the meat of the problem first.

It's quite straightforward. Every expression has a collation, the COLLATE
keyword just overrides it. And the collation is a parameter of the
operators/functions that want to use it. Implementation is also
straightforward: add expr :: expr COLLATE ident to the bison code and a
collation oid to the Expr node. The code to bubble up the collation ids
would be in the patch I posted a few years back.

> pg_index would need an additional column to go along with indkey and
> indoptions. It would be an oidvector of collation oids.

Have you decided what the collation oid will refer to? If the index is
a descending index, does that affect the collation oid?

> When planning a query or building an index we need to determine whether the
> opclass is collation-aware or not. It seemed best that we not hard code which
> data types are collation aware and instead look at either the operator or the
> pg_proc entry for the function implementing the sort operator or perhaps the
> btproc for the opclass to see if it takes a third argument. Or perhaps we
> would have a boolean column in pg_opclass which asserts they do and that it
> should be the collation. If so then the collation is stored in the sort key
> and the indcollation slots and is passed to the operator or the btproc
> function as the third argument.

The way I approached it was to just mark the opclass (perhaps opfamily
would be better now, it didn't exist at the time). Additionally you
will need to mark the individual operators/functions as to whether
they're sensetive to the collation or not. This is needed because an
Error state in the collation oid is only an error if applied to an
operator that cares. This might not be needed in the first
implementation, though.

> We think even without the parser changes this would be useful enough to take
> on its own. It would allow having indexes built in different collations and
> have different sessions use different collations. You could even hack queries
> which combine collations by defining a function handle subparts of the query
> and attach the guc the same way we attach the search_path to security definer
> functions.

Ugh. I think just implementing the parser changes will be less work.

> To finish the feature the catalog needs to add a default collation to every
> column and some other database objects. Then the parser needs to check those
> objects first before falling back to the session variable. Then it needs to
> bubble that value up as it builds expressions so that it's available at the
> comparison operator or sort node. We didn't read the part of the spec which
> covered the rules for this but it does have rules which should turn up if you
> search for "collation derivation".

Once you've added the default collation to every object, the session
variable becomes redundant. Even unknown text strings will have a
collation oid as soon as they are cast to text, since the text type
will have a default collation (just like it has a default operator
class).

Nice to see some progress being made here.

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2008-10-18 12:55:44 ALTER TABLE ... SET DATA TYPE (SQL:2008)
Previous Message Dave Gudeman 2008-10-18 09:55:23 adding collation to a SQL database