Re: WIP: index support for regexp search

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Erik Rijkers <er(at)xs4all(dot)nl>
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: WIP: index support for regexp search
Date: 2012-01-21 05:26:36
Message-ID: CAPpHfdv5xmBoTCkxuFBnD8LFtD8-DALQaqrGT7tRLLHKYg-OSQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi!

Thank you for your feedback!

On Fri, Jan 20, 2012 at 3:33 AM, Erik Rijkers <er(at)xs4all(dot)nl> wrote:

> The patch yields spectacular speedups with small, simple-enough regexen.
> But it does not do a
> good enough job when guessing where to use the index and where fall back
> to Seq Scan. This can
> lead to (also spectacular) slow-downs, compared to Seq Scan.
>
Could you give some examples of regexes where index scan becomes slower
than seq scan?

> I guessed that MAX_COLOR_CHARS limits the character class size (to 4, in
> your patch), is that
> true? I can understand you want that value to be low to limit the above
> risk, but now it reduces
> the usability of the feature a bit: one has to split up larger
> char-classes into several smaller
> ones to make a statement use the index: i.e.:
>
Yes, MAX_COLOR_CHARS is number of maximum character in automata color when
that color is divided to a separated characters. And it's likely there
could be better solution than just have this hard limit.

> Btw, it seems impossible to Ctrl-C out of a search once it is submitted; I
> suppose this is
> normally necessary for perfomance reasons, but it would be useful te be
> able to compile a test
> version that allows it. I don't know how hard that would be.
>
I seems that Ctrl-C was impossible because procedure of trigrams
exctraction becomes so long while it is not breakable. It's not difficult
to make this procedure breakable, but actually it just shouldn't take so
long.

> There is also a minor bug, I think, when running with 'set
> enable_seqscan=off' in combination
> with a too-large regex:
>
Thanks for pointing. Will be fixed.

------
With best regards,
Alexander Korotkov.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2012-01-21 05:28:20 Re: WIP: index support for regexp search
Previous Message Peter Geoghegan 2012-01-21 03:37:22 Re: Group commit, revised