Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: champion(dot)p(at)gmail(dot)com, er(at)xs4all(dot)nl, vik(at)postgresfriends(dot)org
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2023-10-22 02:39:20
Message-ID: 20231022.113920.289851862882439378.t-ishii@sranhm.sra.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Attached is the v10 patch. This version enhances the performance of
pattern matching. Previously it generated all possible pattern string
candidates. This resulted in unnecessarily large number of
candidates. For example if you have 2 pattern variables and the target
frame includes 100 rows, the number of candidates can reach to 2^100
in the worst case. To avoid this, I do a pruning in the v10
patch. Suppose you have:

PATTERN (A B+ C+)

Candidates like "BAC" "CAB" cannot survive because they never satisfy
the search pattern. To judge this, I assign sequence numbers (0, 1, 2)
to (A B C). If the pattern generator tries to generate BA, this is
not allowed because the sequence number for B is 1 and for A is 0, and
0 < 1: B cannot be followed by A. Note that this technique can be
applied when the quantifiers are "+" or "*". Maybe other quantifiers
such as '?' or '{n, m}' can be applied too but I don't confirm yet
because I have not implemented them yet.

Besides this improvement, I fixed a bug in the previous and older
patches: when an expression in DEFINE uses text operators, it errors
out:

ERROR: could not determine which collation to use for string comparison
HINT: Use the COLLATE clause to set the collation explicitly.

This was fixed by adding assign_expr_collations() in
transformDefineClause().

Also I have updated documentation "3.5. Window Functions"

- It still mentioned about rpr(). It's not applied anymore.
- Enhance the description about DEFINE and PATTERN.
- Mention that quantifier '*' is supported.

Finally I have added more test cases to the regression test.
- same pattern variable appears twice
- case for quantifier '*'

Best reagards,
--
Tatsuo Ishii
SRA OSS LLC
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

Attachment Content-Type Size
v10-0001-Row-pattern-recognition-patch-for-raw-parser.patch text/x-patch 21.1 KB
v10-0002-Row-pattern-recognition-patch-parse-analysis.patch text/x-patch 11.8 KB
v10-0003-Row-pattern-recognition-patch-planner.patch text/x-patch 4.8 KB
v10-0004-Row-pattern-recognition-patch-executor.patch text/x-patch 49.5 KB
v10-0005-Row-pattern-recognition-patch-docs.patch text/x-patch 9.6 KB
v10-0006-Row-pattern-recognition-patch-tests.patch text/x-patch 41.9 KB
v10-0007-Allow-to-print-raw-parse-tree.patch text/x-patch 750 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xing Guo 2023-10-22 02:46:17 Re: Guiding principle for dropping LLVM versions?
Previous Message Thomas Munro 2023-10-22 02:04:15 Re: Guiding principle for dropping LLVM versions?