Re: Open issues for collations

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Open issues for collations
Date: 2011-03-26 09:38:56
Message-ID: AANLkTimt1YWWSKQbwNRBd1ZB+WCwS6gbda0touVRc60a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Mar 26, 2011 at 4:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>> I think some discussion of which of the things on the open
>> item lists need to be done before beta might be helpful, and we ought
>> to add any items that are not there but are blockers.
>
> Here's a quick enumeration of some things I think need discussion about
> the collations patch:
>
> * Are we happy yet with the collation assignment behavior (see
> parse_collate.c)?  A couple of specific subtopics:
>
> ** Selecting a field from a record-returning function's output.
> Currently, we'll use the field's declared collation; except that
> if the field has default collation, we'll replace that with the common
> collation of the function's inputs, if any.  Is either part of that
> sane?  Do we need to make this work for functions invoked with other
> syntax than a plain function call, eg operator or cast syntax?
>
> ** What to do with domains whose declaration includes a COLLATE clause?
> Currently, we'll impute that collation to the result of a cast to the
> domain type --- even if the cast's input expression includes an
> explicit COLLATE clause.  It's not clear that that's per spec.  If it
> is correct, should we behave similarly for functions that are declared
> to return a domain type?  Should it matter if the cast-to-domain is
> explicit or implicit?  Perhaps it'd be best if domain collations only
> mattered for columns declared with that domain type.  Then we'd have
> a general rule that collations only come into play in an expression
> as a result of (a) the declared type of a column reference or (b)
> an explicit COLLATE clause.
>
>
> * In plpgsql, is it OK for declared local variables to inherit the
> function's input collation?  Should we provide a COLLATE option in
> variable declarations to let that be overridden?  If Oracle understands
> COLLATE, probably we should look at what they do in PL/SQL.
>
> * RI triggers should insert COLLATE clauses in generated queries to
> satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
> referenced column's collation.  Right now you may get either table's
> collation depending on which query type is involved.  I think an obvious
> failure may not be possible so long as equality means the same thing in
> all collations, but it's definitely possible that the planner might
> decide it can't use the referenced column's unique index, which would
> suck for performance.  (Note: this rule seems to prove that the
> committee assumes equality can mean different things in different
> collations, else they'd not have felt the need to specify.)
>
> * It'd sure be nice if we had some nontrivial test cases that work in
> encodings besides UTF8.  I'm still bothered that the committed patch
> failed to cover single-byte-encoding cases in upper/lower/initcap.
>
> * Remove initdb's warning about useless locales?  Seems like pointless
> noise, or at least something that can be relegated to debug mode.
>
> * Is it worth adding a cares-about-collation flag to pg_proc?  Probably
> too late to be worrying about such refinements for 9.1.
>
> There are a bunch of other minor issues that I'm still working through,
> but these are the ones that seem to merit discussion.

That's a long list and I think it's clear that we won't resolve all of
those issues to everybody's satisfaction in a single release, let
alone in next week or so. We need a way forwards.

What I think we should do is add detailed documentation on how it
works now. There are many people that would love to help, but not
everybody can visualise exactly the points you are making above, I
would confess that I can't. Having docs that clearly explain a neat
new capability and the various possible gotcha/caveats will help
others come up with test cases and ideas.

It seems to me likely that in real usage many of those gotchas will
drop away because they represent unlikely or perverse use cases.

I don't see anything bad in releasing software that has unresolved
questions, as long as those items are clearly flagged up and we
specifically ask for feedback on them. That looks to me to be a
many-eyeballs approach to the problem.
http://en.wikipedia.org/wiki/Linus%27_Law

Tucking our shoelaces into our shoes doesn't mean the loose ends are
resolved fully.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2011-03-26 10:16:36 Re: race condition in sync rep
Previous Message Simon Riggs 2011-03-26 09:22:52 Re: 9.1 Beta