Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From: James Lucas <jlucasdba(at)gmail(dot)com>
To: "David G(dot) Johnston" <david(dot)g(dot)johnston(at)gmail(dot)com>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Date: 2020-05-27 16:19:31
Message-ID: CAAFmbbN6iY-g4Of4k5S7t52Cvcvk_bO=SxZOySa6F4pR01eUTw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi David,

Thanks for the response. One possibly relevant thing I forgot to
mention. The collation for the database is "en_US.UTF-8", which is
thus also the collation for the t column of ctest.

Per the documentation, it seems putting an implicit collation on the
operation should work. Although the documentation is admittedly a
little vague in this respect. I also found a mail thread in the list
where Peter Eisentraut recommended syntax exactly like this (collate
"C") to work around the inability to use pattern matching on
non-deterministic collation columns. Unfortunately that thread
trailed out without a response if it actually worked.

Noticed something else a bit interesting. Perhaps removing indexes
from the equation would also help:

drop index ctestnd_idx01, ctestnd_idx02, ctest_idx01, ctest_idx02;

explain select * from ctest where t like 'a%' collate "C";
QUERY PLAN
---------------------------------------------------------
Seq Scan on ctest (cost=0.00..1791.00 rows=1 width=10)
Filter: (t ~~ 'a%'::text COLLATE "C")
COMMENT: Okay

explain select * from ctest where t like 'a%' collate mycollation;
QUERY PLAN
---------------------------------------------------------
Seq Scan on ctest (cost=0.00..1791.00 rows=1 width=10)
Filter: (t ~~ 'a%'::text COLLATE mycollation)
COMMENT: Wait, that doesn't seem right.

select * from ctest where t like 'a%' collate mycollation;
ERROR: nondeterministic collations are not supported for LIKE
COMMENT: So in this case, specifying an explicit non-deterministic
collation with EXPLAIN, we get a plan. But when we actually go to
execute, it fails.

explain select * from ctestnd where t like 'a%' collate "C";
ERROR: nondeterministic collations are not supported for LIKE
COMMENT: But in the inverse case, running explain on a column with a
non-deterministic collation, but an explicit deterministic collation,
we don't even get a plan with EXPLAIN. That seems inconsistent. Only
conclusion I can reach is that it's failing a check at an earlier
point in the process than in the other case.

Thanks,
James

On Wed, May 27, 2020 at 10:53 AM David G. Johnston
<david(dot)g(dot)johnston(at)gmail(dot)com> wrote:
>
> On Wed, May 27, 2020 at 8:23 AM James Lucas <jlucasdba(at)gmail(dot)com> wrote:
>>
>>
>> create table ctestnd (id numeric, t text collate mycollation);
>>
>> create index ctestnd_idx02 on ctestnd (t collate "C");
>
>
>>
>> Test on ctestnd:
>> explain select * from ctestnd where t = 'aAa' collate "C";
>> QUERY PLAN
>> ------------------------------------------------------------------------------
>> Index Scan using ctestnd_idx02 on ctestnd (cost=0.42..4.44 rows=1 width=10)
>> Index Cond: (t = 'aAa'::text COLLATE "C")
>> COMMENT: Works as expected.
>
>
> Uses an index scan which is where the deterministic collation exists
>
>>
>>
>> explain select * from ctestnd where t like 'a%';
>> ERROR: nondeterministic collations are not supported for LIKE
>> COMMENT: Fails as expected.
>>
>> explain select * from ctestnd where t like 'a%' collate "C";
>> ERROR: nondeterministic collations are not supported for LIKE
>>
>
> Your schema is inherently unstable in this respect because the planner has to be allowed to choose a sequential scan and as soon as it does it attempts to perform like comparisons with table data that is stored using a non-deterministic collation.
>
> I don't know what kinds of promises we make about implicit collation manipulation here but absent such a transformation the sequential scan plan with LIKE generates an invalid plan choice. That it doesn't go find the index that happens to have a workable collation for the query is unsurprising - whether that is even a possibility is beyond me.
>
> David J.
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2020-05-27 18:00:32 BUG #16466: Valgrind detects an invalid read in dblink_open() with a cursor inside a transaction
Previous Message David G. Johnston 2020-05-27 15:53:04 Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation