BUG #15014: pg_trgm regexp with wchar not good?

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: digoal(at)126(dot)com
Subject: BUG #15014: pg_trgm regexp with wchar not good?
Date: 2018-01-18 13:03:46
Message-ID: 20180118130346.3630.56019@wrigleys.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 15014
Logged by: Zhou Digoal
Email address: digoal(at)126(dot)com
PostgreSQL version: 10.1
Operating system: CentOS 7.x.x64
Description:

HI,

when i use pg_trgm's gin index, with wchar search, it's not good for regexp,
but good for like express.

```
postgres=# \l
List of databases
Name | Owner | Encoding | Collate | Ctype | Access
privileges
-----------+----------+----------+------------+------------+-----------------------
postgres | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 |
template0 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres
+
| | | | |
postgres=CTc/postgres
template1 | postgres | UTF8 | en_US.UTF8 | en_US.UTF8 | =c/postgres
+

create or replace function gen_hanzi(int) returns text as $$
declare
res text;
begin
if $1 >=1 then
select string_agg(chr(19968+(random()*20901)::int), '') into res from
generate_series(1,$1);
return res;
end if;
return null;
end;
$$ language plpgsql strict;

postgres=# create table test(id int, info text);
CREATE TABLE
postgres=# insert into test select generate_series(1,100000),
gen_hanzi(100);
INSERT 0 100000
postgres=# create index idx_test_1 on test using gin (info gin_trgm_ops);
CREATE INDEX

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '婐绷乂畳';
QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=45261409.28..45261421.30 rows=10
width=36) (actual time=583.810..816.503 rows=1 loops=1)
Output: id, info
Recheck Cond: (test.info ~ '婐绷乂畳'::text)
Rows Removed by Index Recheck: 99999
Heap Blocks: exact=4167
Buffers: shared hit=59783
-> Bitmap Index Scan on idx_test_1 (cost=0.00..45261409.28 rows=10
width=0) (actual time=583.237..583.237 rows=100000 loops=1)
Index Cond: (test.info ~ '婐绷乂畳'::text)
Buffers: shared hit=55616
Planning time: 0.150 ms
Execution time: 816.545 ms
(11 rows)

postgres=# explain (analyze,verbose,timing,costs,buffers) select * from test
where info ~ '123';
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on public.test (cost=39.40..2897.60 rows=4000 width=36)
(actual time=0.046..0.046 rows=0 loops=1)
Output: id, info
Recheck Cond: (test.info ~ '123'::text)
Buffers: shared hit=4
-> Bitmap Index Scan on idx_test_1 (cost=0.00..38.40 rows=4000 width=0)
(actual time=0.043..0.043 rows=0 loops=1)
Index Cond: (test.info ~ '123'::text)
Buffers: shared hit=4
Planning time: 0.146 ms
Execution time: 0.072 ms
(9 rows)
```

best regards,
digoal

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2018-01-18 15:15:53 Re: BUG #15014: pg_trgm regexp with wchar not good?
Previous Message Kyotaro HORIGUCHI 2018-01-18 12:14:49 Re: Index-only scan returns incorrect results when using a composite GIST index with a gist_trgm_ops column.