| From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
|---|---|
| To: | digoal(at)126(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org, PG Bug reporting form <noreply(at)postgresql(dot)org> |
| Subject: | Re: BUG #15014: pg_trgm regexp with wchar not good? |
| Date: | 2018-01-18 15:15:53 |
| Message-ID: | 18067.1516288553@sss.pgh.pa.us |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-bugs |
=?utf-8?q?PG_Bug_reporting_form?= <noreply(at)postgresql(dot)org> writes:
> when i use pg_trgm's gin index, with wchar search, it's not good for regexp,
> but good for like express.
pg_trgm is going to ignore characters that it doesn't think are letters or
digits. Don't know if the characters you are working with are considered
letters in en_US locale, but if they aren't, that would likely result in
no usable trigrams in this string. Another issue is that "trigrams" are
three *bytes* not three characters, so the useful information per trigram
is a lot lower when working with many-byte characters; that could also
lead to an index search being much less selective than you'd hope.
You might learn something by looking at the result of show_trgm() for
these strings, but I'm thinking there's no bug here, just design
limitations of the trigram approach.
regards, tom lane
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Joe Conway | 2018-01-18 16:39:14 | Re: BUG #15006: "make check" error if current user is "user" |
| Previous Message | PG Bug reporting form | 2018-01-18 13:03:46 | BUG #15014: pg_trgm regexp with wchar not good? |