Re: Hash join not finding which collation to use for string hashing

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: Hash join not finding which collation to use for string hashing
Date: 2020-01-30 20:17:32
Message-ID: CA+TgmoZRXSadSZ27oAcx_OBnHnQ8AKmEin7wXV3KYypPOkBb5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 30, 2020 at 2:44 PM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> 3) Extend the concept of collations to collation sets. Right now, I’m only thinking about a collation set as having two values, the lefthand and the righthand side, but maybe there are other cases like (Left, (Left,Right)) that get built up and need to work. Anyway, at the point in the executor that the collations don’t match, instead of passing NULL down the line, pass in a collation set (Left, Right), and functions like texteq can see that they’re dealing with two different collations and decide if they can deal with that or if they need to throw an error.
>
> I bet if we went with (3), the error being thrown in the example I used to start this thread would go away, without breaking anything else. I’m going to go poke at that a bit, but I’d still appreciate any comments/concerns about my analysis.

I assume that what would have to happen to implement this is that an
SQL-callable function would be passed more than one collation OID,
perhaps one per argument or something like that. Notice, however, that
this would require changing the way that functions get called. See the
DirectFunctionCall{1,2,3,...}Coll() and
FunctionCall{0,1,2,3,...}Coll() and the definition of
FunctionCallInfoBaseData -- there's only one spot for an OID available
right now. Allowing for more would likely have a noticeable impact on
the cost of calling SQL-callable functions, and that's already
expensive enough that people have been unhappy about it. It seems
unlikely that it would be worth incurring more overhead here for every
query all the time just to make this case work.

It is, perhaps, a little strange that the only two choices for an
operator are "cares about collation" and "doesn't," and I somehow feel
like there ought to be a way to do better. But I don't know what it
is.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Dilger 2020-01-30 20:18:05 Re: Hash join not finding which collation to use for string hashing
Previous Message Dan Katz 2020-01-30 20:09:57 Re: ERROR: subtransaction logged without previous top-level txn record