Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: jurafejfar(at)gmail(dot)com
Subject: Re: BUG #16583: merge join on tables with different DB collation behind postgres_fdw fails
Date: 2021-01-28 12:31:46
Message-ID: facc1d5c-cf6a-46ad-a8a1-cc01617b5ddb@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On 2020-08-18 22:09, Tom Lane wrote:
> Here's a full patch addressing this issue. I decided that the best
> way to address the test-instability problem is to explicitly give
> collations to all the foreign-table columns for which it matters
> in the postgres_fdw test. (For portability's sake, that has to be
> "C" or "POSIX"; I mostly used "C".) Aside from ensuring that the
> test still passes with some other prevailing locale, this seems like
> a good idea since we'll then be testing the case we are encouraging
> users to use.

I have studied this patch and this functionality. I don't think
collation differences between remote and local instances are handled
sufficiently. This bug report and patch addresses one particular case,
where the database-wide collation of the remote and local instance are
different. But it doesn't handle cases like the same collation name
doing different things, having different versions, or different
attributes. This probably works currently because the libc collations
don't have much functionality like that, but there is a variety of work
conceived (or, in the case of version tracking, already done since the
bug was first discussed) that would break that.

Taking a step back, I think there are only two ways this could really
work: Either, the admin makes a promise that all the collations match on
all the instances; then the planner can take advantage of that. Or,
there is no such promise, and then the planner can't. I don't
understand what the currently implemented approach is. It appears to be
something in the middle, where certain representations are made that
certain things might match, and then there is some nontrivial code that
analyzes expressions whether they conform to those rules. As you said,
the description of the import_collate option is kind of hand-wavy about
all this.

--
Peter Eisentraut
2ndQuadrant, an EDB company
https://www.2ndquadrant.com/

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2021-01-28 12:53:31 BUG #16843: pg_upgrade from 12.5 to 13.1 with extension plperlu failed
Previous Message PG Bug reporting form 2021-01-28 12:19:52 BUG #16842: pg_dump uses seek calls on pipe files: suggesting adding a flag to disable seek calls

Browse pgsql-hackers by date

  From Date Subject
Next Message Yugo NAGATA 2021-01-28 12:37:13 Re: [PATCH] Add extra statistics to explain for Nested Loop
Previous Message Amit Kapila 2021-01-28 12:28:14 Re: Determine parallel-safety of partition relations for Inserts