Re: Row pattern recognition

From: Jacob Champion <jchampion(at)timescale(dot)com>
To: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
Cc: er(at)xs4all(dot)nl, vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-09-11 22:13:43
Message-ID: CAAWbhmjq3NY1+Am-QHJ4AFh7mi=2eiiGqj518f3-j-C3EfffPg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 9, 2023 at 4:21 AM Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp> wrote:
> Then we will get for str_set:
> r0: B
> r1: AB
>
> Because r0 only has classifier B, r1 can have A and B. Problem is,
> r2. If we choose A at r1, then r2 = B. But if we choose B at t1, then
> r2 = AB. I guess this is the issue you pointed out.

Right.

> Yeah, probably we have delay evaluation of such pattern variables like
> A, then reevaluate A after the first scan.
>
> What about leaving this (reevaluation) for now? Because:
>
> 1) we don't have CLASSIFIER
> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>
> so I think we don't need to worry about this for now.

Sure. I'm all for deferring features to make it easier to iterate; I
just want to make sure the architecture doesn't hit a dead end. Or at
least, not without being aware of it.

Also: is CLASSIFIER the only way to run into this issue?

> What if we don't follow the standard, instead we follow POSIX EREs? I
> think this is better for users unless RPR's REs has significant merit
> for users.

Piggybacking off of what Vik wrote upthread, I think we would not be
doing ourselves any favors by introducing a non-compliant
implementation that performs worse than a traditional NFA. Those would
be some awful bug reports.

> > - I think we have to implement a parallel parser regardless (RPR PATTERN
> > syntax looks incompatible with POSIX)
>
> I am not sure if we need to worry about this because of the reason I
> mentioned above.

Even if we adopted POSIX NFA semantics, we'd still have to implement
our own parser for the PATTERN part of the query. I don't think
there's a good way for us to reuse the parser in src/backend/regex.

> > Does that seem like a workable approach? (Worst-case, my code is just
> > horrible, and we throw it in the trash.)
>
> Yes, it seems workable. I think for the first cut of RPR needs at
> least the +quantifier with reasonable performance. The current naive
> implementation seems to have issue because of exhaustive search.

+1

Thanks!
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2023-09-11 23:56:14 Re: [PoC] pg_upgrade: allow to upgrade publisher node
Previous Message Thomas Munro 2023-09-11 21:04:59 Re: lockup in parallel hash join on dikkop (freebsd 14.0-current)