Re: BUG #14623: pg_trgm doesn't correctly process some regexp with negative lookahead

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jeff(dot)janes(at)gmail(dot)com
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14623: pg_trgm doesn't correctly process some regexp with negative lookahead
Date: 2017-04-13 19:24:21
Message-ID: 21076.1492111461@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

jeff(dot)janes(at)gmail(dot)com writes:
> This simplified case is easy to reproduce back to at least 9.4, and is still
> present in 10dev HEAD. Matches are missed when using the trgm index, but
> not when doing the full table scan.

> select * from foobar where x ~ 'eldrazi (?!scion)'; -- returns 0 rows

> The trigrams seem to be extracted correctly, but the graph stuffed into
> extra_data is not correct. Looking at /tmp/packed.dot, there are no arrays
> pointing to the successful termination state s1. Instead, I get lead to a
> dead-end state s7.

Hm. I think what is happening here is that regexport.c is being too
simplistic by ignoring LACON arcs. Because it does so, there's actually
no path in the exported search NFA that can reach the success state,
which explains your observation of the lack of such a path in
/tmp/packed.dot. It's not entirely unreasonable for pg_trgm to be
assuming that such a path must exist.

What regexport.c should be doing is assuming that every LACON constraint
succeeds, hence treating such an arc as a traversable but
zero-input-consuming arc.

Since there's no notion of zero input consumption in the exported
representation, we're going to need some logic in regexport.c to
convert that (by traversing to the arc target state and emitting
its output arcs, possibly recursively).

> If I change the regexp to 'eldrazi (?!s)', then bug goes away, and
> /tmp/packed.dot shows the correct graph pointing to s1.

That seems to be because processlacons will simplify a single-character
LACON into a plain AHEAD/BEHIND constraint, which leads to an NFA that
doesn't confuse the export logic.

regards, tom lane

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Corey Csuhta 2017-04-13 20:04:08 Re: BUG #14623: pg_trgm doesn't correctly process some regexp with negative lookahead
Previous Message jeff.janes 2017-04-13 18:05:03 BUG #14623: pg_trgm doesn't correctly process some regexp with negative lookahead