Re: WIP: index support for regexp search

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc: Erikjan Rijkers <er(at)xs4all(dot)nl>, Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>, Tomas Vondra <tv(at)fuzzy(dot)cz>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Pavel Stìhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: WIP: index support for regexp search
Date: 2013-04-09 05:15:06
Message-ID: 6219.1365484506@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Korotkov <aekorotkov(at)gmail(dot)com> writes:
> On Mon, Apr 8, 2013 at 9:28 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> I spent the weekend hacking on this, making a number of bug fixes and a
>> whole lot of cosmetic changes. I think there are large parts of this
>> that are in committable shape now, but I still find the actual graph
>> transformation logic to be mostly unintelligible. I think what's most
>> obscure is the distinction between the arcs list and the keys list of
>> each state in the expanded graph. I get the impression that the
>> general idea is for the arcs to represent exactly-known transitions
>> while the keys represent imprecisely-known transitions ... but there
>> seems to be at least some leakage between those categories. Could
>> you write down a specification for what's supposed to be happening
>> there?

> Here is my try to specify it.

Thanks. I hacked on this some more and committed it. I found a number
of bugs along the way with respect to handling of word boundaries
(partially-blank transition trigrams) and EOL-color ($) handling.
I think it's all fixed now but it could definitely use some more
study and testing.

One issue that bothered me is that the regression tests really don't
provide much visibility into what the code is doing. Some of the bugs
had to do with failing to generate expected trigrams, for instance
col ~ 'foo bar' only generating trigram "foo" and not "bar". This still
led to getting the right answer, so the error was invisible as far as the
tests were concerned. Is it worth thinking of a way to expose what the
extract function did at SQL level, so we could test more carefully?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2013-04-09 06:05:43 Re: Unrecognized type error (postgres 9.1.4)
Previous Message Robert Haas 2013-04-09 03:58:13 Re: Page replacement algorithm in buffer cache