Re: Row pattern recognition

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: pgsql-hackers(at)postgresql(dot)org, vik(at)postgresfriends(dot)org
Subject: Re: Row pattern recognition
Date: 2023-07-20 23:36:37
Message-ID: 022a392f-2a9e-3df1-a509-a5bd51160fa5@timescale.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Ishii-san,

On 7/19/23 22:15, Tatsuo Ishii wrote:
> Currently my patch has a limitation for the sake of simple
> implementation: a pattern like "A+" is parsed and analyzed in the raw
> parser. This makes subsequent process much easier because the pattern
> element variable (in this case "A") and the quantifier (in this case
> "+") is already identified by the raw parser. However there are much
> more cases are allowed in the standard as you already pointed out. For
> those cases probably we should give up to parse PATTERN items in the
> raw parser, and instead the raw parser just accepts the elements as
> Sconst?

Is there a concern that the PATTERN grammar can't be represented in
Bison? I thought it was all context-free... Or is the concern that the
parse tree of the pattern is hard to feed into a regex engine?

> Any comments, especially on the PREV/NEXT implementation part is
> welcome. Currently the DEFINE expression like "price > PREV(price)" is
> prepared in ExecInitWindowAgg using ExecInitExpr,tweaking var->varno
> in Var node so that PREV uses OUTER_VAR, NEXT uses INNER_VAR. Then
> evaluate the expression in ExecWindowAgg using ExecEvalExpr, setting
> previous row TupleSlot to ExprContext->ecxt_outertuple, and next row
> TupleSlot to ExprContext->ecxt_innertuple. I think this is temporary
> hack and should be gotten ride of before v1 is committed. Better idea?

I'm not familiar enough with this code yet to offer very concrete
suggestions, sorry... But at some point in the future, we need to be
able to skip forward and backward from arbitrary points, like

DEFINE B AS B.price > PREV(FIRST(A.price), 3)

so there won't be just one pair of "previous and next" tuples. Maybe
that can help clarify the design? It feels like it'll need to eventually
be a "real" function that operates on the window state, even if it
doesn't support all of the possible complexities in v1.

--

Taking a closer look at the regex engine:

It looks like the + qualifier has trouble when it meets the end of the
frame. For instance, try removing the last row of the 'stock' table in
rpr.sql; some of the final matches will disappear unexpectedly. Or try a
pattern like

PATTERN ( a+ )
DEFINE a AS TRUE

which doesn't seem to match anything in my testing.

There's also the issue of backtracking in the face of reclassification,
as I think Vik was alluding to upthread. The pattern

PATTERN ( a+ b+ )
DEFINE a AS col = 2,
b AS col = 2

doesn't match a sequence of values (2 2 ...) with the current
implementation, even with a dummy row at the end to avoid the
end-of-frame bug.

(I've attached two failing tests against v2, to hopefully better
illustrate, along with what I _think_ should be the correct results.)

I'm not quite understanding the match loop in evaluate_pattern(). It
looks like we're building up a string to pass to the regex engine, but
by the we call regexp_instr, don't we already know whether or not the
pattern will match based on the expression evaluation we've done?

Thanks,
--Jacob

Attachment Content-Type Size
tests.diff.txt text/plain 3.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Vik Fearing 2023-07-21 00:07:44 Re: Row pattern recognition
Previous Message John Morris 2023-07-20 23:13:29 Re: Atomic ops for unlogged LSN