Note about bug #14563

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Cc: Alexander Korotkov <a(dot)korotkov(at)postgrespro(dot)ru>
Subject: Note about bug #14563
Date: 2017-02-22 18:19:54
Views: Raw Message | Whole Thread | Download mbox
Lists: pgsql-hackers

I looked into the report in bug #14563 about pg_trgm giving wrong
answers for regexp searches. There is a whole lot of complex mechanism
in trgm_regexp.c for extracting trigram patterns out of regexps, and
so I was afraid that it might be a very subtle bug, but actually the
problem seems to be in an apparently-trivial step. That's the part in
selectColorTrigrams() where we discard trigrams if there are too many,
fixing the graph by merging the states linked by a discarded trigram.
We are not supposed to merge initial and final states, because if we
did the graph would be broken, and the code that checks for that
looks simple enough ... but it's wrong. Consider

Sinit ---> Sx ---> Sfin

where the two arcs depicted are labeled by the same color trigram.
We will examine each arc and decide that it doesn't link the init
and fin states, and thereby decide that it's okay to remove C.
But removing the first arc merges Sx into Sinit, and removing the
second arc merges Sfin into Sx and thereby into Sinit, and we've
blown it.

I think a reasonable fix for this is to add a "tentative parent" field
into struct TrgmState that is kept NULL except during this specific logic.
In the loop looking for disallowed merges, set "tentative parent" to
indicate a merge we would make if successful, and chase up tentative as
well as real parent links. Then reset all the tentative links before
moving on, whether or not the merge is allowed. In the example above,
we'd set Sx's tentative parent to Sinit, then while looking at the second
arc we would chase up from Sx to Sinit and realize we can't merge.

It seems like a good idea, as well, for either mergeStates() or its
sole caller to have an Assert, if not indeed a full-fledged runtime
check, that we aren't merging init and final states.

Also, I notice that the "children" state lists are useless and could just
be dropped. The only thing they're used for at all is reducing the length
of the parent-link chains, but I see no point in that considering the code
isn't relying on those chains to be length 1. It's very unlikely that we
save enough cycles by having shorter chains to pay for the overhead of
allocating and maintaining the children lists.

regards, tom lane

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2017-02-22 18:29:18 Re: Change in "policy" on dump ordering?
Previous Message Mithun Cy 2017-02-22 18:19:48 Re: Proposal : For Auto-Prewarm.