Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: jchampion(at)timescale(dot)com
Cc: er(at)xs4all(dot)nl, vik(at)postgresfriends(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-09-12 06:18:43
Message-ID: 20230912.151843.1810014897541305662.t-ishii@sranhm.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>> What about leaving this (reevaluation) for now? Because:
>>
>> 1) we don't have CLASSIFIER
>> 2) we don't allow to give CLASSIFIER to PREV as its arggument
>>
>> so I think we don't need to worry about this for now.
>
> Sure. I'm all for deferring features to make it easier to iterate; I
> just want to make sure the architecture doesn't hit a dead end. Or at
> least, not without being aware of it.

Ok, let's defer this issue. Currently the patch already exceeds 3k
lines. I am afraid too big patch cannot be reviewed by anyone, which
means it will never be committed.

> Also: is CLASSIFIER the only way to run into this issue?

Good question. I would like to know.

>> What if we don't follow the standard, instead we follow POSIX EREs? I
>> think this is better for users unless RPR's REs has significant merit
>> for users.
>
> Piggybacking off of what Vik wrote upthread, I think we would not be
> doing ourselves any favors by introducing a non-compliant
> implementation that performs worse than a traditional NFA. Those would
> be some awful bug reports.

What I am not sure about is, you and Vik mentioned that the
traditional NFA is superior that POSIX NFA in terms of performance.
But how "lexicographic ordering" is related to performance?

>> I am not sure if we need to worry about this because of the reason I
>> mentioned above.
>
> Even if we adopted POSIX NFA semantics, we'd still have to implement
> our own parser for the PATTERN part of the query. I don't think
> there's a good way for us to reuse the parser in src/backend/regex.

Ok.

>> > Does that seem like a workable approach? (Worst-case, my code is just
>> > horrible, and we throw it in the trash.)
>>
>> Yes, it seems workable. I think for the first cut of RPR needs at
>> least the +quantifier with reasonable performance. The current naive
>> implementation seems to have issue because of exhaustive search.
>
> +1

BTW, attched is the v6 patch. The differences from v5 include:

- Now aggregates can be used with RPR. Below is an example from the
regression test cases, which is added by v6 patch.

- Fix assersion error pointed out by Erik.

SELECT company, tdate, price,
first_value(price) OVER w,
last_value(price) OVER w,
max(price) OVER w,
min(price) OVER w,
sum(price) OVER w,
avg(price) OVER w,
count(price) OVER w
FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+ DOWN+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);
company | tdate | price | first_value | last_value | max | min | sum | avg | count
----------+------------+-------+-------------+------------+------+-----+------+-----------------------+-------
company1 | 07-01-2023 | 100 | 100 | 140 | 200 | 100 | 590 | 147.5000000000000000 | 4
company1 | 07-02-2023 | 200 | | | | | | |
company1 | 07-03-2023 | 150 | | | | | | |
company1 | 07-04-2023 | 140 | | | | | | |
company1 | 07-05-2023 | 150 | | | | | | |
company1 | 07-06-2023 | 90 | 90 | 120 | 130 | 90 | 450 | 112.5000000000000000 | 4
company1 | 07-07-2023 | 110 | | | | | | |
company1 | 07-08-2023 | 130 | | | | | | |
company1 | 07-09-2023 | 120 | | | | | | |
company1 | 07-10-2023 | 130 | | | | | | |
company2 | 07-01-2023 | 50 | 50 | 1400 | 2000 | 50 | 4950 | 1237.5000000000000000 | 4
company2 | 07-02-2023 | 2000 | | | | | | |
company2 | 07-03-2023 | 1500 | | | | | | |
company2 | 07-04-2023 | 1400 | | | | | | |
company2 | 07-05-2023 | 1500 | | | | | | |
company2 | 07-06-2023 | 60 | 60 | 1200 | 1300 | 60 | 3660 | 915.0000000000000000 | 4
company2 | 07-07-2023 | 1100 | | | | | | |
company2 | 07-08-2023 | 1300 | | | | | | |
company2 | 07-09-2023 | 1200 | | | | | | |
company2 | 07-10-2023 | 1300 | | | | | | |
(20 rows)

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Attachment Content-Type Size
v6-0001-Row-pattern-recognition-patch-for-raw-parser.patch text/x-patch 20.7 KB
v6-0002-Row-pattern-recognition-patch-parse-analysis.patch text/x-patch 11.6 KB
v6-0003-Row-pattern-recognition-patch-planner.patch text/x-patch 4.8 KB
v6-0004-Row-pattern-recognition-patch-executor.patch text/x-patch 34.9 KB
v6-0005-Row-pattern-recognition-patch-docs.patch text/x-patch 8.2 KB
v6-0006-Row-pattern-recognition-patch-tests.patch text/x-patch 35.3 KB
v6-0007-Allow-to-print-raw-parse-tree.patch text/x-patch 749 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2023-09-12 06:27:35 Re: Make --help output fit within 80 columns per line
Previous Message Michael Paquier 2023-09-12 06:18:05 Re: Make psql's qeury canceling test simple by using signal() routine of IPC::Run