| From: | SungJun Jang <sjjang112233(at)gmail(dot)com> |
|---|---|
| To: | Tatsuo Ishii <ishii(at)postgresql(dot)org> |
| Cc: | assam258(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org |
| Subject: | Re: Row pattern recognition |
| Date: | 2026-03-17 04:59:24 |
| Message-ID: | CAE+cgNgSG6oiAXT=FL+gK71Squ_eacyTCxSVXaAUTwODrkzqFg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi hackers,
Thank you for the excellent report! I am glad to see that PostgreSQL
> RPR is much faster than Trino, especially in the match failure cases.
>
> Is it possible to share the data generation script and the query for
> PostgreSQL so that I could locally perform the tests?
Here is a self-contained guide to reproduce the RPR ABCD pattern test
locally
using PostgreSQL, Trino, and Oracle.
Requirements:
PostgreSQL 19devel build with the RPR patch applied (local)
Trino and Oracle can be set up via Docker (see README for details):
https://github.com/assam258-5892/docker-databases
Start all services:
cd ~/docker-databases && docker compose up -d trino-service oracle-service
Note: docker compose up does not pull images automatically if they are
already
cached locally. To fetch the latest images, run docker compose pull first.
Note: The Oracle image requires an Oracle account. Register at
https://container-registry.oracle.com, accept the license for the database
image, then log in before pulling:
docker login container-registry.oracle.com
Connect to each database shell:
Trino: docker compose exec -it trino-service trino
Oracle: docker compose exec -it oracle-service sqlplus / as sysdba
Step 1: Create the test table and data
PostgreSQL (1x scale, 20,000 rows):
DROP TABLE IF EXISTS abcd_test;
CREATE TABLE abcd_test AS
SELECT v,
CASE
WHEN v % 10000 < 3333 THEN 'A'
WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B'
WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C'
WHEN v % 10000 = 9999 THEN 'D'
END AS cat
FROM generate_series(0, 19999) AS v;
ANALYZE abcd_test;
Trino (1x scale, 20,000 rows):
CREATE SCHEMA IF NOT EXISTS memory.test;
DROP TABLE IF EXISTS memory.test.abcd_test;
CREATE TABLE memory.test.abcd_test AS
WITH nums AS (
SELECT a.v * 10000 + b.v AS v
FROM UNNEST(sequence(0, 1)) AS a(v)
CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v)
)
SELECT CAST(v AS INTEGER) AS v,
CASE
WHEN v % 10000 < 3333 THEN 'A'
WHEN v % 10000 >= 3333 AND v % 10000 < 6666 THEN 'B'
WHEN v % 10000 >= 6666 AND v % 10000 < 9999 THEN 'C'
WHEN v % 10000 = 9999 THEN 'D'
END AS cat
FROM nums;
Note: Trino sequence() is limited to 10,000 elements per call, so a CROSS
JOIN
is used. For scale Sx, change sequence(0, 1) to sequence(0, S*2-1).
Oracle (1x scale, 20,000 rows):
DROP TABLE abcd_test PURGE;
CREATE TABLE abcd_test AS
SELECT v,
CASE
WHEN MOD(v, 10000) < 3333 THEN 'A'
WHEN MOD(v, 10000) >= 3333 AND MOD(v, 10000) < 6666 THEN 'B'
WHEN MOD(v, 10000) >= 6666 AND MOD(v, 10000) < 9999 THEN 'C'
WHEN MOD(v, 10000) = 9999 THEN 'D'
END AS cat
FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 20000);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST');
Verify data distribution (all engines):
SELECT cat, COUNT(*) AS cnt FROM abcd_test GROUP BY cat ORDER BY cat;
Expected (1x): A=6666, B=6666, C=6666, D=2
Step 2: Run Test 1 — A+ B+ C+ D (match expected)
Expected: 2 rows returned (one match per segment)
PostgreSQL:
SELECT match_first, match_last, match_len
FROM (
SELECT v,
first_value(v) OVER w AS match_first,
last_value(v) OVER w AS match_last,
count(*) OVER w AS match_len
FROM abcd_test
WINDOW w AS (
ORDER BY v
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A+ B+ C+ D)
DEFINE
A AS cat = 'A',
B AS cat = 'B',
C AS cat = 'C',
D AS cat = 'D'
)
) result
WHERE match_len > 0;
Trino / Oracle:
SELECT match_first, match_last, match_len
FROM abcd_test
MATCH_RECOGNIZE (
ORDER BY v
MEASURES
FIRST(v) AS match_first,
LAST(v) AS match_last,
COUNT(*) AS match_len
ONE ROW PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (A+ B+ C+ D)
DEFINE
A AS cat = 'A',
B AS cat = 'B',
C AS cat = 'C',
D AS cat = 'D'
) mr;
Note: Trino uses memory.test.abcd_test as the table name.
Expected result (1x):
match_first | match_last | match_len
------------|------------|----------
0 | 9999 | 10000
10000 | 19999 | 10000
Step 3: Run Test 2 — A+ B+ C+ E (match failure)
Expected: 0 rows (E does not exist)
Use the same queries as Test 1 with two changes:
PATTERN: (A+ B+ C+ D) → (A+ B+ C+ E)
DEFINE: cat = 'D' → cat = 'E'
Warning: Trino Test 2 at 1x scale takes approximately 5-6 minutes.
Step 4: Scale up (optional)
Re-create the test table at 2x scale (40,000 rows) and then repeat Step 2
and
Step 3.
PostgreSQL (2x scale, 40,000 rows):
DROP TABLE IF EXISTS abcd_test;
CREATE TABLE abcd_test AS
SELECT v,
CASE
WHEN v % 20000 < 6666 THEN 'A'
WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B'
WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C'
WHEN v % 20000 = 19999 THEN 'D'
END AS cat
FROM generate_series(0, 39999) AS v;
ANALYZE abcd_test;
Trino (2x scale, 40,000 rows):
CREATE SCHEMA IF NOT EXISTS memory.test;
DROP TABLE IF EXISTS memory.test.abcd_test;
CREATE TABLE memory.test.abcd_test AS
WITH nums AS (
SELECT a.v * 10000 + b.v AS v
FROM UNNEST(sequence(0, 3)) AS a(v)
CROSS JOIN UNNEST(sequence(0, 9999)) AS b(v)
)
SELECT CAST(v AS INTEGER) AS v,
CASE
WHEN v % 20000 < 6666 THEN 'A'
WHEN v % 20000 >= 6666 AND v % 20000 < 13332 THEN 'B'
WHEN v % 20000 >= 13332 AND v % 20000 < 19999 THEN 'C'
WHEN v % 20000 = 19999 THEN 'D'
END AS cat
FROM nums;
Oracle (2x scale, 40,000 rows):
DROP TABLE abcd_test PURGE;
CREATE TABLE abcd_test AS
SELECT v,
CASE
WHEN MOD(v, 20000) < 6666 THEN 'A'
WHEN MOD(v, 20000) >= 6666 AND MOD(v, 20000) < 13332 THEN 'B'
WHEN MOD(v, 20000) >= 13332 AND MOD(v, 20000) < 19999 THEN 'C'
WHEN MOD(v, 20000) = 19999 THEN 'D'
END AS cat
FROM (SELECT LEVEL - 1 AS v FROM dual CONNECT BY LEVEL <= 40000);
EXEC DBMS_STATS.GATHER_TABLE_STATS(USER, 'ABCD_TEST');
Expected (2x): A=13332, B=13332, C=13332, D=4
Then run Step 2 and Step 3 as-is.
Warning: Trino Test 2 at 2x scale takes approximately 20-25 minutes.
Please let me know if you encounter any issues reproducing this.
Best regards
SungJun
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Amit Kapila | 2026-03-17 05:02:42 | Re: Report bytes and transactions actually sent downtream |
| Previous Message | Ashutosh Bapat | 2026-03-17 04:57:24 | Re: SQL Property Graph Queries (SQL/PGQ) |