Re: `pg_trgm` not recognizing Chinese characters in macOS

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Haotian Yang <yangnw(at)live(dot)com>
Cc: "pgsql-bugs(at)postgresql(dot)org" <pgsql-bugs(at)postgresql(dot)org>
Subject: Re: `pg_trgm` not recognizing Chinese characters in macOS
Date: 2018-09-11 13:20:13
Message-ID: 18165.1536672013@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Haotian Yang <yangnw(at)live(dot)com> writes:
> Versions: macOS 10.13.6, PostgreSQL 10.5, pg_trgm 1.3.
> LC_ALL=en_US.UTF-8

pg_trgm relies on libc's functions (specifically, iswalpha()) to determine
what is a word character or not. Unfortunately, the UTF8 locale support
in macOS is pretty incomplete, and I don't find it too surprising that
it's not recognizing Chinese characters as alphabetic. Now, you could
make a good argument that they *shouldn't* be considered alphabetic in
an en_US locale; but I'm unsure whether switching to a more appropriate
locale will help.

Anyway, I'd first try zh_CN.UTF-8, and if that doesn't fix it, the place
to complain is https://bugreport.apple.com/ ... I'm sure they know about
it already, but the number of reports has an impact on how fast they
fix things.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alexander Korotkov 2018-09-11 14:39:36 Re: BUG #15378: SP-GIST memory context screwup?
Previous Message Andrew Gierth 2018-09-11 05:30:35 Re: BUG #15379: Release process of the index access method is not called when an error occurs.