Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: vik(at)postgresfriends(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-06-26 01:05:20
Message-ID: 20230626.100520.2022209340496571373.t-ishii@sranhm.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> I have been dreaming of and lobbying for someone to pick up this
> feature. I will be sure to review it from a standards perspective and
> will try my best to help with the technical aspect, but I am not sure
> to have the qualifications for that.
>
> THANK YOU!

Thank you for looking into my proposal.

>> (I know SQL:2023 is already out, but I don't have access to it)
>
> If you can, try to get ISO/IEC 19075-5 which is a guide to RPR instead
> of just its technical specification.
>
> https://www.iso.org/standard/78936.html

Thanks for the info.

> I don't understand this. RPR in a window specification limits the
> window to the matched rows, so this looks like your rpr() function is
> just the regular first_value() window function that we already have?

No, rpr() is different from first_value(). rpr() returns the argument
value at the first row in a frame only when matched rows found. On the
other hand first_value() returns the argument value at the first row
in a frame unconditionally.

company | tdate | price | rpr | first_value
----------+------------+-------+------+-------------
company1 | 2023-07-01 | 100 | | 100
company1 | 2023-07-02 | 200 | 200 | 200
company1 | 2023-07-03 | 150 | 150 | 150
company1 | 2023-07-04 | 140 | | 140
company1 | 2023-07-05 | 150 | 150 | 150
company1 | 2023-07-06 | 90 | | 90
company1 | 2023-07-07 | 110 | | 110
company1 | 2023-07-08 | 130 | | 130
company1 | 2023-07-09 | 120 | | 120
company1 | 2023-07-10 | 130 | | 130

For example, a frame starting with (tdate = 2023-07-02, price = 200)
consists of rows (price = 200, 150, 140, 150) satisfying the pattern,
thus rpr returns 200. Since in this example frame option "ROWS BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING" is specified, next frame starts
with (tdate = 2023-07-03, price = 150). This frame satisfies the
pattern too (price = 150, 140, 150), and rpr retus 150... and so on.

> As in your example, you cannot have START.price outside of the window
> specification; it can only go in the MEASURES clause. Only startprice
> is allowed outside and it gets its qualification from the OVER. Using
> w.startprice might have been better but that would require window
> names to be in the same namespace as range tables.
>
> This currently works in Postgres:
>
> SELECT RANK() OVER w
> FROM (VALUES (1)) AS w (x)
> WINDOW w AS (ORDER BY w.x);

Interesting.

>> o SUBSET is not supported
>
> Is this because you haven't done it yet, or because you ran into
> problems trying to do it?

Because it seems SUBSET is not useful without MEASURES support. Thus
my plan is, firstly implement MEASURES, then SUBSET. What do you
think?

>> o Regular expressions other than "+" are not supported
>
> This is what I had a hard time imagining how to do while thinking
> about it. The grammar is so different here and we allow many more
> operators (like "?" which is also the standard parameter symbol).
> People more expert than me will have to help here.

Yes, that is a problem.

> In this case, we should require the user to specify AFTER MATCH SKIP
> TO NEXT ROW so that behavior doesn't change when we implement the
> standard default. (Your patch might do this already.)

Agreed. I will implement AFTER MATCH SKIP PAST LAST ROW in the next
patch and I will change the default to AFTER MATCH SKIP PAST LAST ROW.

>> o Aggregate functions associated with window clause using RPR do not
>> respect RPR
>
> I do not understand what this means.

Ok, let me explain. See example below. In my understanding "count"
should retun the number of rows in a frame restriced by the match
condition. For example at the first line (2023-07-01 | 100) count
returns 10. I think this should be 0 because the "restriced" frame
starting at the line contains no matched row. On the other hand the
(restricted) frame starting at second line (2023-07-02 | 200) contains
4 rows, thus count should return 4, instead of 9.

SELECT company, tdate, price, rpr(price) OVER w, count(*) OVER w FROM stock
WINDOW w AS (
PARTITION BY company
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
PATTERN (START DOWN+ UP+)
DEFINE
START AS TRUE,
UP AS price > PREV(price),
DOWN AS price < PREV(price)
);

company | tdate | price | rpr | count
----------+------------+-------+------+-------
company1 | 2023-07-01 | 100 | | 10
company1 | 2023-07-02 | 200 | 200 | 9
company1 | 2023-07-03 | 150 | 150 | 8
company1 | 2023-07-04 | 140 | | 7
company1 | 2023-07-05 | 150 | 150 | 6
company1 | 2023-07-06 | 90 | | 5
company1 | 2023-07-07 | 110 | | 4
company1 | 2023-07-08 | 130 | | 3
company1 | 2023-07-09 | 120 | | 2
company1 | 2023-07-10 | 130 | | 1

>> It seems RPR in the standard is quite complex. I think we can start
>> with a small subset of RPR then we could gradually enhance the
>> implementation.
>
> I have no problem with that as long as we don't paint ourselves into a
> corner.

Totally agreed.

>> Comments and suggestions are welcome.
>
> I have not looked at the patch yet, but is the reason for doing R020
> before R010 because you haven't done the MEASURES clause yet?

One of the reasons is, implementing MATCH_RECOGNIZE (R010) looked
harder for me because modifying main SELECT clause could be a hard
work. Another reason is, I had no idea how to implement PREV/NEXT in
other than in WINDOW clause. Other people might feel differently
though.

> In any case, I will be watching this with a close eye, and I am eager
> to help in any way I can.

Thank you! I am looking forward to comments on my patch. Also any
idea how to implement MEASURES clause is welcome.

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alena Rybakina 2023-06-26 01:47:43 Re: POC, WIP: OR-clause support for indexes
Previous Message Ranier Vilela 2023-06-26 00:55:04 Re: Speeding Up Bitmapset