Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation

From: James Lucas <jlucasdba(at)gmail(dot)com>
To: PostgreSQL mailing lists <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Date: 2020-05-28 20:42:07
Message-ID: CAAFmbbOez_LbhD767pPH0LGdEF3=nP9CdtmU+7gYxF_f-_Ce0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Apologies if anyone gets this twice. I got a rejected mail notice
back the first time I sent.

You are correct. I was playing around with collation naming sometime
back and when I started looking at this, I just used one I had left in
the database assuming it was correct. That's my bad.

I dropped the tables and redefined the collation as
create collation mycollation (provider = icu, locale =
'en-US-u-ks-level2', deterministic = false);

Now the results are more what I expected.

select schemaname, tablename, attname, n_distinct, most_common_vals
from pg_stats where attname='t' and tablename like 'stest%' order by
tablename;
schemaname | tablename | attname | n_distinct | most_common_vals
------------+-----------+---------+------------+---------------------------
public | stest | t | 6 | {aaa,cCc,bBb,bbb,ccc,aAa}
public | stestnd | t | 3 | {ccc,bbb,aaa}

So that is something to be aware of - the collation defined on the
column can impact stats values, which could in turn impact plans
chosen for queries that use alternative collations.

Sorry for the distraction. That still leaves us with the original
issue regarding LIKE and COLLATE.

Thanks,
James

On Thu, May 28, 2020 at 1:48 PM Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
>
> Tom Lane wrote:
>
> > I suspect that the 'en-US-ks-level2' ICU locale doesn't act as you
> > think it does.
>
> Indeed, because the syntax is tricky. The OP wants 'en-US-u-ks-level2'.
> With 'en-US-ks-level2', the ks-level2 component is ignored and you
> get a tertiary colstrength.
>
> Or use 'en-US(at)colStrength=secondary' which is possibly more
> readable and works with older versions of ICU.
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2020-05-28 20:48:00 Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation
Previous Message Daniel Verite 2020-05-28 18:48:38 Re: Explicit deterministic COLLATE fails with pattern matching operations on column with non-deterministic collation