Re: WIP: index support for regexp search

From: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
To: Erikjan Rijkers <er(at)xs4all(dot)nl>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stìhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WIP: index support for regexp search
Date: 2013-04-03 09:18:27
Message-ID: CAPpHfdsitdJZNyQk5UCK0sAh1F08147pP7DkDH_Gh_Men8ofxw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 3, 2013 at 11:10 AM, Erikjan Rijkers <er(at)xs4all(dot)nl> wrote:

> On Tue, April 2, 2013 23:54, Alexander Korotkov wrote:
>
> > [trgm-regexp-0.15.patch.gz]
>
> Yes, it does look good now; Attached a list of measurements. Most of the
> searches that I put in
> that test-program are now speeded up very much.
>
> There still are a few regressions, for example:
>
> HEAD azjunk6 x[aeiou]{4,5}q 83 Seq Scan
> 1393.465 ms
> trgm_regex15 azjunk6 x[aeiou]{4,5}q 83 Bitmap Heap Scan
> 1728.319 ms
>
> HEAD azjunk7 x[aeiou]{1,3}q 190031 Seq Scan
> 16819.555 ms
> trgm_regex15 azjunk7 x[aeiou]{1,3}q 190031 Bitmap Heap Scan
> 21286.804 ms
>
> Not exactly negligible, and ideally those regressions would be removed but
> with the huge
> advantages for other cases I'd say it's worth it.
>

Thank you for testing!
Exploring results more detail I found version 13 to be buggy. This version
is a dead end, we have quite different API now. Could you use v12 instead
of v13 in comparison, please?
Sometimes we have regression in comparison with head in two reasons:
1) We select index scan in both cases but with patch we spent more time for
analysis. It's inevitable disadvantage of any index. We can only take care
of analysis doesn't take too long. Current testing results don't show this
reason to be significant.
2) Sometimes we select index scan while sequential scan would be faster.
It's also inevitable disadvantage until we have a relevant statistics. We
now have similar situation, for example, with in-core geometrical search
and LIKE/ILIKE search in pg_trgm. However, probably, situation could be
improved somehow even without such statistics. But I think we can do such
conclusion based on synthetical testing, because improvements for
synthetical cases could appear to be an worsening for real-life cases.

------
With best regards,
Alexander Korotkov.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2013-04-03 09:31:21 Re: regression test failed when enabling checksum
Previous Message Erikjan Rijkers 2013-04-03 07:10:08 Re: WIP: index support for regexp search