Re: Wildcard search support for pg_trgm

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Wildcard search support for pg_trgm
Date: 2011-01-08 22:37:06
Message-ID: AANLkTi=_Cu5SMi3kdRS32BheoX4mtLPW=4dzuVNpsGQD@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I updated my patch to make it use full index scan in GIN index which is
possible thanks to recent Tom Lane patch. Now wildcard, where no trigram can
be extracted from, invokes full index scan, which is slow but correct.

test=# explain (analyze, buffers) select * from words where word ilike
'%in%';
QUERY PLAN

------------------------------------------------------------------------------------------------------------
Seq Scan on words (cost=0.00..1703.11 rows=15930 width=9) (actual
time=0.333..225.817 rows=16558 loops=1)
Filter: (word ~~* '%in%'::text)
Buffers: shared read=471
Total runtime: 248.207 ms
(4 rows)

test=# set enable_seqscan = off;
SET
test=# explain (analyze, buffers) select * from words where word ilike
'%in%';
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=2287.46..2957.59 rows=15930 width=9)
(actual time=122.239..331.993
rows=16558 loops=1)
Recheck Cond: (word ~~* '%in%'::text)
Buffers: shared hit=472 read=1185
-> Bitmap Index Scan on trgm_idx (cost=0.00..2283.48 rows=15930
width=0) (actual time=122.022..122.022 rows=98569 loops=1)
Index Cond: (word ~~* '%in%'::text)
Buffers: shared hit=1 read=1185
Total runtime: 354.409 ms
(7 rows)

As an alternative solution I can propose to extract null item from every
string and ivoke scan on that item instead of full index scan. It requires
to store additional item per each string but it makes full scan fast.

Also I found a segfault when execute the query above and
switch enable_seqscan few times on line "*searchMode =
GIN_SEARCH_MODE_ALL;". Is it a bug in GIN or I'm missing something?

Here goes backtrace from gdb:
#0 0xb4ead070 in gin_extract_query_trgm (fcinfo=0xbfcd8da8) at
trgm_gin.c:112
#1 0x08323a84 in OidFunctionCall5 (functionId=32802, arg1=161269768,
arg2=3217920208, arg3=4,
arg4=3217920204, arg5=3217920200) at fmgr.c:1687
#2 0x082c5654 in gincostestimate (fcinfo=0xbfcd9148) at selfuncs.c:6466
#3 0x083235d8 in OidFunctionCall9 (functionId=2741, arg1=161270176,
arg2=161271296,
arg3=161824624, arg4=0, arg5=0, arg6=3217921064, arg7=3217921056,
arg8=3217921048,
arg9=3217921040) at fmgr.c:1840
#4 0x081f3397 in cost_index (path=0x9a55050, root=0x99cc9a0,
index=0x99cce00,
indexQuals=0x9a53f70, indexOrderBys=0x0, outer_rel=0x0) at
costsize.c:268
#5 0x08216b66 in create_index_path (root=0x99cc9a0, index=0x99cce00,
clause_groups=0x9a53f88,
indexorderbys=0x0, pathkeys=0x0, indexscandir=NoMovementScanDirection,
outer_rel=0x0)
at pathnode.c:511
#6 0x081f7ef5 in find_usable_indexes (root=<value optimized out>,
rel=<value optimized out>,
clauses=<value optimized out>, outer_clauses=0x0, istoplevel=1 '\001',
outer_rel=0x0,
saop_control=SAOP_FORBID, scantype=ST_ANYSCAN) at indxpath.c:422
#7 0x081f8e38 in create_index_paths (root=0x99cc9a0, rel=0x99ccc30) at
indxpath.c:176
#8 0x081eec22 in set_plain_rel_pathlist (root=<value optimized out>,
rel=<value optimized out>,
rti=<value optimized out>, rte=0x99cc650) at allpaths.c:262
#9 set_rel_pathlist (root=<value optimized out>, rel=<value optimized
out>,
rti=<value optimized out>, rte=0x99cc650) at allpaths.c:202
#10 0x081efa55 in set_base_rel_pathlists (root=0x99cc9a0,
joinlist=0x99ccde8) at allpaths.c:158
#11 make_one_rel (root=0x99cc9a0, joinlist=0x99ccde8) at allpaths.c:94
#12 0x08203ef7 in query_planner (root=0x99cc9a0, tlist=0x99ccb00,
tuple_fraction=0,
limit_tuples=-1, cheapest_path=0xbfcd98cc, sorted_path=0xbfcd98c8,
num_groups=0xbfcd98c0)
at planmain.c:271
#13 0x08205b86 in grouping_planner (root=0x99cc9a0, tuple_fraction=0) at
planner.c:1182
#14 0x08207609 in subquery_planner (glob=0x99cc910, parse=0x99cc5a0,
parent_root=0x0,
hasRecursion=0 '\000', tuple_fraction=0, subroot=0xbfcd9a7c) at
planner.c:536
#15 0x08207ca6 in standard_planner (parse=0x99cc5a0, cursorOptions=0,
boundParams=0x0)
at planner.c:201
#16 0x0825db11 in pg_plan_query (querytree=0x99cc5a0, cursorOptions=0,
boundParams=0x0)
at postgres.c:764
#17 0x0815a824 in ExplainOneQuery (stmt=0x9a258e0,
queryString=0x9a24c60 "explain (analyze, buffers) select * from words
where word ilike '%in%';",---Type <return> to continue, or q <return> to
quit---
params=0x0, dest=0x9a32330) at explain.c:300
#18 ExplainQuery (stmt=0x9a258e0,
queryString=0x9a24c60 "explain (analyze, buffers) select * from words
where word ilike '%in%';", params=0x0, dest=0x9a32330) at explain.c:209
#19 0x08261266 in PortalRunUtility (portal=0x9a4d6a8,
utilityStmt=0x9a258e0,
isTopLevel=<value optimized out>, dest=0x9a32330,
completionTag=0xbfcd9bcc "") at pquery.c:1191
#20 0x082622a4 in FillPortalStore (portal=0x9a4d6a8, isTopLevel=32 ' ') at
pquery.c:1065
#21 0x0826281a in PortalRun (portal=0x9a4d6a8, count=2147483647,
isTopLevel=-56 '\310',
dest=0x9a25ee8, altdest=0x9a25ee8, completionTag=0xbfcd9d9c "") at
pquery.c:791
#22 0x0825ec78 in exec_simple_query (query_string=<value optimized out>) at
postgres.c:1059
#23 0x0825fe01 in PostgresMain (argc=2, argv=0x99aeb68, username=0x99aead8
"smagen")
at postgres.c:3939
#24 0x08229250 in BackendRun () at postmaster.c:3577
#25 BackendStartup () at postmaster.c:3262
#26 ServerLoop () at postmaster.c:1448
#27 0x0822be12 in PostmasterMain (argc=3, argv=0x99acc58) at
postmaster.c:1109
#28 0x081ce3b7 in main (argc=3, argv=0x99acc58) at main.c:199

----
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
trgm_wildcard-0.2.patch text/x-patch 16.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-01-08 22:49:43 Re: contrib/intarray (was Re: Fixing GIN for empty/null/full-scan cases)
Previous Message David E. Wheeler 2011-01-08 22:04:03 Re: contrib/intarray (was Re: Fixing GIN for empty/null/full-scan cases)