Re: WIP: index support for regexp search

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Erikjan Rijkers <er(at)xs4all(dot)nl>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stìhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WIP: index support for regexp search
Date: 2013-04-15 13:53:41
Message-ID: CAPpHfdtTwv_X0ev7H1MtCXSD9MOHFeRkc8-4wzvGyJa_ManHeA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I found you committed GiST index implementation. That's cool.
I found an easy way to optimize it. We can also use trigramsMatchGraph for
signatures. Attached patch contains implementation.
Simple example in order to demonstrate it:

Before the patch:

test=# explain (analyze, buffers) select * from words where s ~ '[abc]def';
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=4.36..40.24 rows=10 width=9) (actual
time=17.189..17.193 rows=3 loops=1)
Recheck Cond: (s ~ '[abc]def'::text)
Buffers: shared hit=858
-> Bitmap Index Scan on words_trgm_idx (cost=0.00..4.36 rows=10
width=0) (actual time=17.172..17.172 rows=3 loops=1)
Index Cond: (s ~ '[abc]def'::text)
Buffers: shared hit=*857*
Total runtime: 17.224 ms
(7 rows)

After the patch:

test=# explain (analyze, buffers) select * from words where s ~ '[abc]def';
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on words (cost=4.36..40.24 rows=10 width=9) (actual
time=13.718..13.721 rows=3 loops=1)
Recheck Cond: (s ~ '[abc]def'::text)
Buffers: shared hit=498
-> Bitmap Index Scan on words_trgm_idx (cost=0.00..4.36 rows=10
width=0) (actual time=13.701..13.701 rows=3 loops=1)
Index Cond: (s ~ '[abc]def'::text)
Buffers: shared hit=*497*
Total runtime: 13.786 ms
(7 rows)

------
With best regards,
Alexander Korotkov.

Attachment Content-Type Size
trgm-regexp-gist-optimize.patch application/octet-stream 1.4 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2013-04-15 14:00:34 COPY and Volatile default expressions
Previous Message Florian Pflug 2013-04-15 07:32:03 Re: Inconsistent DB data in Streaming Replication