Re: TODO item: Implement Boyer-Moore searching in LIKE queries

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Karan Sikka <karanssikka(at)gmail(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: TODO item: Implement Boyer-Moore searching in LIKE queries
Date: 2016-08-12 17:27:56
Message-ID: 6654.1471022876@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Karan Sikka <karanssikka(at)gmail(dot)com> writes:
>> Having said that, I've had a bee in my bonnet for a long time about
>> removing per-row setup cost for repetitive regex matches, and
>> whatever infrastructure that needs would work for this too.

> What are the per-row setup costs for regex matches?

Well, they're pretty darn high if you have more active regexps than will
fit in that cache, and even if you don't, the cache lookup seems a bit
inefficient. What I'd really like to do is get rid of that cache in favor
of having a way to treat a precompiled regexp as a constant. I think this
is probably possible via inventing a "regexp" datatype, which we make the
declared RHS input type for the ~ operator, and give it an implicit cast
from text so that existing queries don't break. The compiled regexp tree
structure contains pointers so it could never go to disk, but now that
we have the "expanded datum" infrastructure you could imagine that the
on-disk representation is the same as text but we support adding a
compiled tree to it in-memory.

Or maybe we just need a smarter cache mechanism in regexp.c. A cache
like that might be the only way to deal with a query using variable
patterns (e.g, pattern argument coming from a table column). But it
seems like basically the wrong approach for the common case of a
constant pattern.

regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Jeff Janes 2016-08-12 17:44:53 Re: Btree Index on PostgreSQL and Wiredtiger (MongoDB3.2)
Previous Message Karan Sikka 2016-08-12 17:12:10 Re: TODO item: Implement Boyer-Moore searching in LIKE queries