Re: insensitive collations

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: "Peter Geoghegan" <pg(at)bowt(dot)ie>,"pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: insensitive collations
Date: 2019-03-04 14:58:58
Message-ID: 9bd25519-c62d-4aa2-9189-f471c5d09e8a@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Peter Eisentraut wrote:

[v7-0001-Collations-with-nondeterministic-comparison.patch]

+GenericMatchText(const char *s, int slen, const char *p, int plen, Oid
collation)
{
+ if (collation && !lc_ctype_is_c(collation) && collation !=
DEFAULT_COLLATION_OID)
+ {
+ pg_locale_t locale = pg_newlocale_from_collation(collation);
+
+ if (locale && !locale->deterministic)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("nondeterministic collations are not supported for
LIKE")));
+ }

This test gets side-stepped when pattern_fixed_prefix() in selfuncs.c
returns Pattern_Prefix_Exact, and the code optimizes the operation by
converting it to a bytewise equality test, or a bytewise range check
in the index with Pattern_Type_Prefix.

Here's a reproducer:

===
create collation ciai (locale='und-u-ks-level1', deterministic=false,
provider='icu');

create table w(t text collate "C");

insert into w select md5(i::text) from generate_series(1,10000) as i;
insert into w values('abc');

create index indexname on w(t );

select t from w where t like 'ABC' collate ciai;
t
---
(0 rows)

select t from w where t like 'ABC%' collate ciai;
t
---
(0 rows)

===

For the LIKE operator, I think the fix should be that like_fixed_prefix()
should always return Pattern_Prefix_None for non-deterministic collations.

For regular expressions, pg_set_regex_collation() is called at some
point when checking for a potential prefix, and since it errors out with
non-deterministic collations, this issue is taken care of already.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2019-03-04 15:46:25 Re: pg_dump multi VALUES INSERT
Previous Message David Steele 2019-03-04 14:39:46 Re: Re: proposal: variadic argument support for least, greatest function