Quick Links

Re: WIP: index support for regexp search

From:	Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>
Cc:	pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: WIP: index support for regexp search
Date:	2012-01-19 20:30:20
Message-ID:	4F187D5C.30701@enterprisedb.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On 22.11.2011 21:38, Alexander Korotkov wrote:
> WIP patch with index support for regexp search for pg_trgm contrib is
> attached.
> In spite of techniques which extracts continuous text parts from regexp,
> this patch presents technique of automatum transformation. That allows more
> comprehensive trigrams extraction.

Nice!

> Current version of patch have some limitations:
> 1) Algorithm of logical expression extraction on trigrams have high
> computational complexity. So, it can become really slow on regexp with many
> branches. Probably, improvements of this algorithm is possible.
> 2) Surely, no perfomance benefit if no trigrams can be extracted from
> regexp. It's inevitably.
> 3) Currently, only GIN index is supported. There are no serious problems,
> GiST code for it just not written yet.
> 4) It appear to be some kind of problem to extract multibyte encoded
> character from pg_wchar. I've posted question about it here:
> http://archives.postgresql.org/pgsql-hackers/2011-11/msg01222.php
> While I've hardcoded some dirty solution. So
> PG_EUC_JP, PG_EUC_CN, PG_EUC_KR, PG_EUC_TW, PG_EUC_JIS_2004 are not
> supported yet.

This is pretty far from being in committable state, so I'm going to mark
this as "returned with feedback" in the commitfest app. The feedback:

The code badly needs comments. There is no explanation of how the
trigram extraction code in trgm_regexp.c works. Guessing from the
variable names, it seems to be some sort of a coloring algorithm that
works on a graph, but that all needs to be explained. Can this algorithm
be found somewhere in literature, perhaps? A link to a paper would be nice.

Apart from that, the multibyte issue seems like the big one. Any way
around that?

--
Heikki Linnakangas
EnterpriseDB http://www.enterprisedb.com

In response to

WIP: index support for regexp search at 2011-11-22 19:38:37 from Alexander Korotkov

Responses

Re: WIP: index support for regexp search at 2012-01-19 20:54:24 from Alexander Korotkov
Re: WIP: index support for regexp search at 2012-01-19 23:33:55 from Erik Rijkers

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dimitri Fontaine	2012-01-19 20:42:54	Re: Inline Extension
Previous Message	Robert Haas	2012-01-19 20:26:08	Re: Arithmetic operators for macaddr type