Re: WIP: index support for regexp search

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Alexander Korotkov" <aekorotkov(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: index support for regexp search
Date: 2012-01-21 07:03:04
Message-ID: d9141ddf8c4e763f57f41661c9af699d.squirrel@webmail.xs4all.nl
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
> Hi!
>
> Thank you for your feedback!
>
> On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
>> The patch yields spectacular speedups with small, simple-enough regexen.
>> But it does not do a
>> good enough job when guessing where to use the index and where fall back
>> to Seq Scan. This can
>> lead to (also spectacular) slow-downs, compared to Seq Scan.
>>
> Could you give some examples of regexes where index scan becomes slower
> than seq scan?
>

x[aeio]+q takes many minutes, uninterruptible. It's now running for almost 30 minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you get the point ;-)

Even with {n,m} quantifiers it's easy to hit slowdowns:

(table azjunk6 is 112 MB, the index 693 MB.)
(MAX_COLOR_CHARS=4 <= your original patch)

instance table regex plan time

HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3566.088 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3540.606 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3495.034 ms
HEAD azjunk6 x[aeio]{1,3}q Seq Scan 3510.403 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3724.131 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3844.999 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3835.190 ms
trgm_regex azjunk6 x[aeio]{1,3}q Bitmap Heap Scan 3724.016 ms

HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3617.997 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3644.215 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3636.976 ms
HEAD azjunk6 x[aeio]{1,4}q Seq Scan 3625.493 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7885.247 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 8799.082 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7754.152 ms
trgm_regex azjunk6 x[aeio]{1,4}q Bitmap Heap Scan 7721.332 ms

This is with your patch as is; in instances compiled with higher MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...

Erik Rijkers

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hitoshi Harada 2012-01-21 07:49:11 Re: Finer Extension dependencies
Previous Message Alexander Korotkov 2012-01-21 05:29:56 Re: WIP: index support for regexp search