Re: Hash join not finding which collation to use for string hashing

From: Amit Langote <amitlangote09(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Etsuro Fujita <etsuro(dot)fujita(at)gmail(dot)com>
Subject: Re: Hash join not finding which collation to use for string hashing
Date: 2020-01-31 09:37:57
Message-ID: CA+HiwqFjz+SyxU2cGLWz6nnkw5Q2xdjDuOCg4_ryH5eNCw9PEw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 31, 2020 at 6:15 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> > On Jan 30, 2020, at 12:29 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> > Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> writes:
> >> Would it make sense to catch a qual with unassigned collation
> >> somewhere in the planner, where the qual's operator family is
> >> estatblished, by checking if the operator family behavior is sensitive
> >> to collations?
> >
> >> I’m not sure how to do that. pg_opfamily doesn’t seem to have a field for that. Can you recommend how I would proceed there?
> >
> > There's no such information attached to opfamilies, which is more or less
> > forced by the fact that individual operators don't expose it either.
> > There's not much hope of making that better without incompatible changes
> > in the requirements for extensions to define operators and/or operator
> > families.
>
> Thanks, Tom, for confirming this.

Just for the record, I will explain why I brought up doing anything
with operator families at all. What I was really imagining is putting
a hard-coded check somewhere in the middle of equivalence processing
to see if a given qual's operator would be sensitive to collation
based *only* on whether it belongs to a text operator family, such as
TEXT_BTREE_FAM_OID, whereas the qual expression's inputcollid is 0
(parser failed to resolve collation conflict among its arguments) and
erroring out if so. If we do that, maybe we won't need
check_collation_set() that's used in various text operators. Also,
erroring out sooner might allow us to produce more specific error
message, which as I understand it, would help with one of the Mark's
complaints that the error message is too ambiguous due to emitted as
such a low layer.

I thought of the idea after running into a recent commit relevant to
non-deterministic collations:

commit 2810396312664bdb941e549df7dfa75218d73a1c
Author: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Date: Sat Sep 21 16:29:17 2019 -0400

Fix up handling of nondeterministic collations with pattern_ops opclasses.

text_pattern_ops and its siblings can't be used with nondeterministic
collations, because they use the text_eq operator which will not behave
as bitwise equality if applied with a nondeterministic collation. The
initial implementation of that restriction was to insert a run-time test
in the related comparison functions, but that is inefficient, may throw
misleading errors, and will throw errors in some cases that would work.
It seems sufficient to just prevent the combination during CREATE INDEX,
so do that instead.

Lacking any better way to identify the opclasses involved, we need to
hard-wire tests for them, which requires hand-assigned values for their
OIDs, which forces a catversion bump because they previously had OIDs
that would be assigned automatically. That's slightly annoying in the
v12 branch, but fortunately we're not at rc1 yet, so just do it.

Discussion: https://postgr.es/m/22566.1568675619@sss.pgh.pa.us

IIUC, the above commit removes a check_collation_set() call from a
operator class comparison function in favor of ensuring that an index
using that operator class can only be defined with a deterministic
collation in the first place. But as the above commit is purportedly
only a stop-gap solution due to lacking operator class infrastructure
to consider collation in operator semantics, maybe we shouldn't spread
such a hack in other places.

Thanks,
Amit

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2020-01-31 09:40:32 Re: Marking some contrib modules as trusted extensions
Previous Message Noah Misch 2020-01-31 08:42:13 Re: SimpleLruTruncate() mutual exclusion