From: | Robert Haas <robertmhaas(at)gmail(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Open issues for collations |
Date: | 2011-04-05 14:13:54 |
Message-ID: | BANLkTi=ayf5RT6b6JE7LdaobB7VEYvBvHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Reading through this thread...
On Sat, Mar 26, 2011 at 12:36 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> ** Selecting a field from a record-returning function's output.
> Currently, we'll use the field's declared collation; except that
> if the field has default collation, we'll replace that with the common
> collation of the function's inputs, if any. Is either part of that
> sane? Do we need to make this work for functions invoked with other
> syntax than a plain function call, eg operator or cast syntax?
There were a couple of different ideas about which way we ought to go
with this, but I'm happy to defer to what Tom and Martijn hashed out:
MO> That seems all a bit weird. I spent some time reading through the SQL
MO> spec to see if I could came up with a few ideas about what they thought
MO> relevent. I think the gist of it is that I think the result row should
MO> have for each column its declared collation in all cases.
TL> That interpretation would be fine with me. It would let us get rid of
TL> the special-case code at lines 307-324 of parse_collate.c, which I put
TL> in only because there are cases in the collate.linux.utf8.sql regression
TL> test that fail without it. But I'm perfectly happy to conclude that
TL> those test cases are mistaken.
I'm not sure whether that's been done, though, or whether we're even
going to do it.
> ** What to do with domains whose declaration includes a COLLATE clause?
> Currently, we'll impute that collation to the result of a cast to the
> domain type --- even if the cast's input expression includes an
> explicit COLLATE clause. It's not clear that that's per spec. If it
> is correct, should we behave similarly for functions that are declared
> to return a domain type? Should it matter if the cast-to-domain is
> explicit or implicit? Perhaps it'd be best if domain collations only
> mattered for columns declared with that domain type. Then we'd have
> a general rule that collations only come into play in an expression
> as a result of (a) the declared type of a column reference or (b)
> an explicit COLLATE clause.
I think we had agreement than a cast to a domain type with a collation
should stomp on any existing collation on the contained expression.
> * In plpgsql, is it OK for declared local variables to inherit the
> function's input collation? Should we provide a COLLATE option in
> variable declarations to let that be overridden?
I think everyone who responded said "yes" to both questions.
> * RI triggers should insert COLLATE clauses in generated queries to
> satisfy SQL2008 9.13 SR 4a, which says that RI comparisons use the
> referenced column's collation. Right now you may get either table's
> collation depending on which query type is involved. I think an obvious
> failure may not be possible so long as equality means the same thing in
> all collations, but it's definitely possible that the planner might
> decide it can't use the referenced column's unique index, which would
> suck for performance. (Note: this rule seems to prove that the
> committee assumes equality can mean different things in different
> collations, else they'd not have felt the need to specify.)
I tested this and, indeed, if the collations don't match, the index
can't be used.
CREATE TABLE me (x character varying COLLATE "en_US");
CREATE TABLE me2 (x character varying COLLATE "es_ES");
CREATE TABLE you (x character varying COLLATE "es_ES" NOT NULL,
PRIMARY KEY (x));
ALTER TABLE me ADD FOREIGN KEY (x) REFERENCES you(x);
ALTER TABLE me2 ADD FOREIGN KEY (x) REFERENCES you(x);
SET enable_seqscan=false;
SET enable_hashjoin=false;
With that setup, this still does a seqscan-and-sort:
EXPLAIN select * from me, you where me.x = you.x;
But this uses the index:
EXPLAIN select * from me2, you where me2.x = you.x;
I found another problem, too:
rhaas=# insert into you values ('1');
INSERT 0 1
rhaas=# insert into me values ('1');
INSERT 0 1
rhaas=# alter table me alter column x set data type varchar collate "en_GB";
ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.
CONTEXT: SQL statement "SELECT fk."x" FROM ONLY "public"."me" fk LEFT
OUTER JOIN ONLY "public"."you" pk ON ( pk."x"::pg_catalog.text
OPERATOR(pg_catalog.=) fk."x"::pg_catalog.text) WHERE pk."x" IS NULL
AND (fk."x" IS NOT NULL)"
> * It'd sure be nice if we had some nontrivial test cases that work in
> encodings besides UTF8. I'm still bothered that the committed patch
> failed to cover single-byte-encoding cases in upper/lower/initcap.
Seems like no one knows how to do this.
> * Remove initdb's warning about useless locales? Seems like pointless
> noise, or at least something that can be relegated to debug mode.
Everyone was in favor of this.
> * Is it worth adding a cares-about-collation flag to pg_proc? Probably
> too late to be worrying about such refinements for 9.1.
Probably too late for this.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2011-04-05 14:31:54 | Re: time table for beta1 |
Previous Message | Greg Stark | 2011-04-05 14:01:42 | Re: Re: synchronous_commit and synchronous_replication Re: [COMMITTERS] pgsql: Efficient transaction-controlled synchronous replication. |