Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Cc: Ibrar Ahmed <ibrar(dot)ahmad(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Neil Chen <carpenter(dot)nail(dot)cz(at)gmail(dot)com>
Subject: Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date: 2021-09-24 19:11:33
Message-ID: 2438715.1632510693@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com> writes:
> One thing I noticed is that collatable operators/functions sent to the
> remote might also cause an unexpected result when the default
> collations are not compatible. Consider this example (even with your
> patch):
> ...
> where ft1 is a foreign table with an integer column c1. As shown
> above, the sort using the collatable function chr() is performed
> remotely, so the select query might produce the result in an
> unexpected sort order when the default collations are not compatible.

I don't think there's anything really new there --- it's still assuming
that COLLATE "default" means the same locally and remotely.

As a short-term answer, I propose that we apply (and back-patch) the
attached documentation changes.

Longer-term, it seems like we really have to be able to represent
the notion of a remote column that has an "unknown" collation (that
is, one that doesn't match any local collation, or at least is not
known to do so). My previous patch essentially makes "default" act
that way, but conflating "unknown" with "default" has too many
downsides. A rough sketch for making this happen is:

1. Create a built-in "unknown" entry in pg_collation. Insert some
hack or other to prevent this from being applied to any real, local
column; but allow foreign-table columns to have it.

2. Apply mods, probably fairly similar to my patch, that prevent
postgres_fdw from believing that "unknown" matches any local
collation. (Hm, actually maybe no special code change will be
needed here, once "unknown" has its own OID?)

3. Change postgresImportForeignSchema so that it can substitute
the "unknown" collation at need. The exact rules for this could
be debated depending on whether you'd rather prioritize safety or
ease-of-use, but I think at least we should use "unknown" whenever
import_collate is turned off. Perhaps there should be an option
to substitute it for remote "default" as well. (Further down the
road, perhaps that could be generalized to allow a user-controlled
mapping from remote to local collations.)

Anyway, I think I should withdraw the upthread patch; we don't
want to go that way.

regards, tom lane

Attachment Content-Type Size
postgres_fdw-collation-doc-warnings.patch text/x-diff 1.5 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Etsuro Fujita 2021-09-25 13:55:29 Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Previous Message Andrew Dunstan 2021-09-24 14:58:52 Re: pg_upgrade test for binary compatibility of core data types

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-09-24 19:16:20 Re: logical decoding and replication of sequences
Previous Message Robert Haas 2021-09-24 18:48:40 Re: decoupling table and index vacuum