Re: Nondeterministic collations vs. text_pattern_ops

From: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Nondeterministic collations vs. text_pattern_ops
Date: 2019-09-17 08:35:22
Message-ID: 57e4d38e-eb2c-9a77-5a7c-b94396187cf9@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2019-09-17 01:13, Tom Lane wrote:
> Whilst poking at the leakproofness-of-texteq issue, I realized
> that there's an independent problem caused by the nondeterminism
> patch. To wit, that the text_pattern_ops btree opclass uses
> texteq as its equality operator, even though that operator is
> no longer guaranteed to be bitwise equality. That means that
> depending on which collation happens to get attached to the
> operator, equality might be inconsistent with the other members
> of the opclass, leading to who-knows-what bad results.

You can't create a text_pattern_ops index on a column with
nondeterministic collation:

create collation c1 (provider = icu, locale = 'und', deterministic = false);
create table t1 (a int, b text collate c1);
create index on t1 (b text_pattern_ops);
ERROR: nondeterministic collations are not supported for operator class
"text_pattern_ops"

There is some discussion in internal_text_pattern_compare().

Are there other cases we need to consider?

I notice that there is a hash opclass text_pattern_ops, which I'd
actually never heard of until now, and I don't see documented. What
would we need to do about that?

> The obvious fix for this is to invent separate new equality operators,
> but that's actually rather disastrous for performance, because
> text_pattern_ops indexes would no longer be able to use WHERE clauses
> using plain equality. That also feeds into whether equality clauses
> deduced from equivalence classes will work for them (nope, not any
> more). People using such indexes are just about certain to be
> bitterly unhappy.

Would it help if one created COLLATE "C" indexes instead of
text_pattern_ops? What are the tradeoffs between the two approaches?

--
Peter Eisentraut http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2019-09-17 08:41:34 Re: Pulling up direct-correlated ANY_SUBLINK
Previous Message Pavel Stehule 2019-09-17 08:22:35 Re: Feature request: binary NOTIFY