Skip site navigation (1) Skip section navigation (2)

Re: WIP: index support for regexp search

From: "Erik Rijkers" <er(at)xs4all(dot)nl>
To: "Alexander Korotkov" <aekorotkov(at)gmail(dot)com>
Cc: "Heikki Linnakangas" <heikki(dot)linnakangas(at)enterprisedb(dot)com>, "pgsql-hackers" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: index support for regexp search
Date: 2012-01-21 07:03:04
Message-ID: d9141ddf8c4e763f57f41661c9af699d.squirrel@webmail.xs4all.nl (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sat, January 21, 2012 06:26, Alexander Korotkov wrote:
> Hi!
>
> Thank you for your feedback!
>
> On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:
>
>> The patch yields spectacular speedups with small, simple-enough regexen.
>>  But it does not do a
>> good enough job when guessing where to use the index and where fall back
>> to Seq Scan.  This can
>> lead to (also spectacular) slow-downs, compared to Seq Scan.
>>
> Could you give some examples of regexes where index scan becomes slower
> than seq scan?
>


x[aeio]+q takes many minutes, uninterruptible.  It's now running for almost 30 minutes, I'm sure
it will come back eventually, but I think I'll kill it later on; I suppose you get the point  ;-)

Even with {n,m} quantifiers it's easy to hit slowdowns:

   (table azjunk6 is 112 MB, the index 693 MB.)
   (MAX_COLOR_CHARS=4  <= your original patch)


instance           table    regex                  plan               time

HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3566.088 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3540.606 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3495.034 ms
HEAD               azjunk6  x[aeio]{1,3}q          Seq Scan           3510.403 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3724.131 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3844.999 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3835.190 ms
trgm_regex         azjunk6  x[aeio]{1,3}q          Bitmap Heap Scan   3724.016 ms


HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3617.997 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3644.215 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3636.976 ms
HEAD               azjunk6  x[aeio]{1,4}q          Seq Scan           3625.493 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7885.247 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   8799.082 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7754.152 ms
trgm_regex         azjunk6  x[aeio]{1,4}q          Bitmap Heap Scan   7721.332 ms


This is with your patch as is; in instances compiled with higher MAX_COLOR_CHARS (I did 6 and 9),
it is of course even easier to dream up a slow regex...



Erik Rijkers






In response to

Responses

pgsql-hackers by date

Next:From: Hitoshi HaradaDate: 2012-01-21 07:49:11
Subject: Re: Finer Extension dependencies
Previous:From: Alexander KorotkovDate: 2012-01-21 05:29:56
Subject: Re: WIP: index support for regexp search

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group