Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: jurafejfar(at)gmail(dot)com, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date: 2020-08-17 22:37:47
Message-ID: 1156217.1597703867@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

I wrote:
> So I think what is happening here is that postgres_fdw's version of
> IMPORT FOREIGN SCHEMA translates "COLLATE default" on the remote
> server to "COLLATE default" on the local one, which of course is
> a big fail if the defaults don't match. That allows the local
> planner to believe that remote ORDER BYs on the two foreign tables
> will give compatible results, causing the merge join to not work
> very well at all.

> We probably need to figure out some way of substituting the remote
> database's actual lc_collate setting when we see "COLLATE default".

Here's a draft patch for that part. There's a few things to quibble
about:

* It tests for "COLLATE default" by checking whether pg_collation.oid
is DEFAULT_COLLATION_OID, thus assuming that that OID will never change.
I think this is safer than checking the collation name, but maybe
somebody else would have a different opinion? Another idea is to check
whether collprovider is 'd', but that only works with v10 and up.

* It might not be able to find a remote collation matching the database's
datcollate/datctype. As coded, we'll end up creating the local column
with "COLLATE default", putting us back in the same hurt we're in now.
I think this is okay given the other planned change to interpret "COLLATE
default" as "we don't know what collation this is". In any case it's hard
to see what else we could do, other than fail entirely.

* Alternatively, it might find more than one such remote collation;
indeed that's the norm, eg we'd typically find both "en_US" and
"en_US.utf8", or the like. I made it choose the shortest collation
name in such cases, but maybe there is a case for the longest?
I don't much want it to pick "ucs_basic" over "C", though.

* The whole thing is certain to fall over whenever we find a way to
allow ICU collations as database defaults. While we can presumably
fix the query when we make that change, existing postgres_fdw releases
would not work against a newer server. Probably there's little to be
done about this, either.

* As shown by the expected-output changes, there are some test cases
that expose that we're not picking the default collation anymore.
That creates a testing problem: this can't be committed as-is because
it'll fail with any other locale environment than what the expected
file was made with. We could lobotomize the test cases to not print
the column collation, but then we're not really verifying that this
code does what it's supposed to. Not sure what the best compromise is.

Comments?

regards, tom lane

Attachment Content-Type Size
dont-import-collations-as-default-1.patch text/x-diff 8.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-08-18 13:43:40 BUG #16584: could not create relation-cache initialization file "global/pg_internal.init.2002": No space left
Previous Message Cyril Jouve 2020-08-17 20:48:11 Re: BUG #16508: using multi-host connection string when the first host is starting fails

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Wong 2020-08-17 23:30:07 Re: doc examples for pghandler
Previous Message Alvaro Herrera 2020-08-17 20:21:01 Re: [BUG] Error in BRIN summarization