Re: Row pattern recognition

From: Tatsuo Ishii <ishii(at)postgresql(dot)org>
To: assam258(at)gmail(dot)com
Cc: jacob(dot)champion(at)enterprisedb(dot)com, david(dot)g(dot)johnston(at)gmail(dot)com, vik(at)postgresfriends(dot)org, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2026-02-09 07:05:20
Message-ID: 20260209.160520.873483599710391856.ishii@postgresql.org
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Henson,

> Sorry, I forgot to attach the patch file in my previous email.

Thanks for the patch! Patch cleanly applied and regression test
passed here.

>> On a side note, I feel that the automata characteristics of RPR --
>> state transitions, context management, absorbability, etc. -- tend to
>> require a much larger volume of test cases than typical RDBMS/SQL
>> features. This seems unavoidable given the combinatorial nature of
>> NFA execution.

I agree. A complex feature requires complex tests. We cannot avoid it.

BTW, I have tested RPR with large partition/reduced frame (up to 100k
rows). Following query took over 3 minutes on my laptop.

EXPLAIN (ANALYZE)
SELECT aid, count(*) OVER w
FROM generate_series(1,100000) AS g(aid) WHERE aid <= 100000
WINDOW w AS (
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP+)
DEFINE START AS TRUE, UP AS aid > PREV(aid)
);
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=4337.40..4504.08 rows=33333 width=14) (actual time=224171.589..224320.055 rows=100000.00 loops=1)
Window: w AS (ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
Pattern: start up+
Storage: Disk Maximum Storage: 4096kB
NFA States: 200000 peak, 5000050001 total, 0 merged
NFA Contexts: 100001 peak, 100001 total, 1 pruned
NFA: 1 matched (len 100000/100000/100000.0), 0 mismatched
NFA: 0 absorbed, 99998 skipped (len 2/99999/50000.5)
Buffers: shared hit=3, temp read=400968 written=391
-> Sort (cost=3754.09..3837.42 rows=33333 width=4) (actual time=14.938..38.911 rows=100000.00 loops=1)
Sort Key: aid
Sort Method: quicksort Memory: 3073kB
Buffers: shared hit=3, temp read=171 written=171
-> Function Scan on generate_series g (cost=0.00..1250.00 rows=33333 width=4) (actual time=5.799..11.672 rows=100000.00 loops=1)
Filter: (aid <= 100000)
Buffers: temp read=171 written=171
Planning:
Buffers: shared hit=6 read=7
Planning Time: 0.185 ms
Execution Time: 224324.532 ms

It ran without eating all memory on my box (according to top command,
it was around 200MB). Great! However it took nearly 4 minutes to
complete the query. I think this is because there are 100k rows
partition/reduced frame, and RPR needs to keep 100k contexts plus 200k
states at the peak. I also ran "perf top" and took following profile.

19.87% postgres [.] nfa_match
18.92% postgres [.] nfa_advance_state
13.60% postgres [.] nfa_cleanup_dead_contexts
11.25% postgres [.] row_is_in_reduced_frame
11.06% postgres [.] nfa_state_clone
5.52% postgres [.] nfa_route_to_elem
4.39% postgres [.] nfa_add_state_unique.isra.0
4.32% postgres [.] nfa_state_alloc
2.36% libc.so.6 [.] __memset_avx2_unaligned_erms
0.63% postgres [.] memset(at)plt

The reason for nfa_match is called so many times is, it's in a a loop
which iterate over all the contexts:

for (ctx = winstate->nfaContext; ctx != NULL; ctx = ctx->next)

I don't try to say we should enhance nfa modules in the case when
there is a large reduced frame, because I don't think this is a
realistic use case and I doubt it's worth to fight against unrealistic
scenario.

Note that even if the partition size is large, if we change the PATTERN to:

PATTERN (START UP{,3})

which generates only 4 rows reduced frames, we get very good
performance:

test=# test=# test=# EXPLAIN (ANALYZE)
SELECT aid, count(*) OVER w
FROM generate_series(1,100000) AS g(aid) WHERE aid <= 100000
WINDOW w AS (
ORDER BY aid
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
AFTER MATCH SKIP PAST LAST ROW
INITIAL
PATTERN (START UP{,3})
DEFINE START AS TRUE, UP AS aid > PREV(aid)
);
---------------------------------------------------------------------------------------------------------------------------------------------
WindowAgg (cost=4337.40..4504.08 rows=33333 width=14) (actual time=36.586..91.988 rows=100000.00 loops=1)
Window: w AS (ORDER BY aid ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
Pattern: start up{0,3}
Storage: Memory Maximum Storage: 17kB
NFA States: 8 peak, 325001 total, 0 merged
NFA Contexts: 5 peak, 100001 total, 0 pruned
NFA: 25000 matched (len 4/4/4.0), 0 mismatched
NFA: 0 absorbed, 75000 skipped (len 1/3/2.0)
Buffers: temp read=171 written=171
-> Sort (cost=3754.09..3837.42 rows=33333 width=4) (actual time=36.561..40.933 rows=100000.00 loops=1)
Sort Key: aid
Sort Method: quicksort Memory: 3073kB
Buffers: temp read=171 written=171
-> Function Scan on generate_series g (cost=0.00..1250.00 rows=33333 width=4) (actual time=14.297..28.692 rows=100000.00 loops=1)
Filter: (aid <= 100000)
Buffers: temp read=171 written=171
Planning Time: 0.087 ms
Execution Time: 94.810 ms
(18 rows)

I think this is more realistic use case than former. If I were
correct, we don't need to work on current nfa code to squeeze better
performance for unrealistic 100k reduced frame case. I maybe wrong
and I would love to hear from others especially, Henson, Vik, Jacob.

Best regards,
--
Tatsuo Ishii
SRA OSS K.K.
English: http://www.sraoss.co.jp/index_en/
Japanese:http://www.sraoss.co.jp

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Richard Guo 2026-02-09 07:08:22 Re: Optimize IS DISTINCT FROM with non-nullable inputs
Previous Message Michael Paquier 2026-02-09 06:44:35 Re: Add expressions to pg_restore_extended_stats()