Re: Row pattern recognition

From: Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com>
To: Tatsuo Ishii <ishii(at)postgresql(dot)org>
Cc: vik(at)postgresfriends(dot)org, david(dot)g(dot)johnston(at)gmail(dot)com, jacob(dot)champion(at)enterprisedb(dot)com, er(at)xs4all(dot)nl, peter(at)eisentraut(dot)org, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Row pattern recognition
Date: 2025-11-27 06:56:02
Message-ID: 7F1A336E-A064-4D15-90D6-182DCDFD1C9D@gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Nov 27, 2025, at 11:10, Tatsuo Ishii <ishii(at)postgresql(dot)org> wrote:
>
> Hi Chao,
>
> Any comment on this?
>
>>> 13 - 0005 - nodeWindowAgg.c
>>> ```
>>> static
>>> int
>>> do_pattern_match(char *pattern, char *encoded_str, int len)
>>> {
>>> static regex_t *regcache = NULL;
>>> static regex_t preg;
>>> static char patbuf[1024]; /* most recent 'pattern' is cached here */
>>> ```
>>>
>>> Using static variable in executor seems something I have never seen, which may persistent data across queries. I think usually per query data are stored in state structures.
>>
>> Yeah, such a usage maybe rare. But I hesitate to store the data
>> (regex_t) in WindowAggState because:
>>
>> (1) it bloats WindowAggState which we want to avoid if possible
>> (2) it requires execnodes.h to include regex.h. (maybe this itself is harmless, I am not sure)

With the static function-scope variables, those data persist across queries, which is error prone. I am afraid that even if I let this pass, other reviewers might have the same concern.

Maybe define a sub structure, say WindowAggCache, and put a pointer of WindowAggCache in WindowAggState, and only allocate memory when a query needs to.

>>
>>> 14 - 0005 - nodeWindowAgg.c
>>> ```
>>> + MemoryContext oldContext = MemoryContextSwitchTo(TopMemoryContext);
>>> +
>>> + if (regcache != NULL)
>>> + pg_regfree(regcache); /* free previous re */
>>> +
>>> + /* we need to convert to char to pg_wchar */
>>> + plen = strlen(pattern);
>>> + data = (pg_wchar *) palloc((plen + 1) * sizeof(pg_wchar));
>>> + data_len = pg_mb2wchar_with_len(pattern, data, plen);
>>> + /* compile re */
>>> + sts = pg_regcomp(&preg, /* compiled re */
>>> + data, /* target pattern */
>>> + data_len, /* length of pattern */
>>> + cflags, /* compile option */
>>> + C_COLLATION_OID /* collation */
>>> + );
>>> + pfree(data);
>>> +
>>> + MemoryContextSwitchTo(oldContext);
>>> ```
>>>
>>> Here in do_pattern_match, it switches to top memory context and compiles the re and stores to “preg". Based on the comment of pg_regcomp:
>>> ```
>>> /*
>>> * pg_regcomp - compile regular expression
>>> *
>>> * Note: on failure, no resources remain allocated, so pg_regfree()
>>> * need not be applied to re.
>>> */
>>> int
>>> pg_regcomp(regex_t *re,
>>> const chr *string,
>>> size_t len,
>>> int flags,
>>> Oid collation)
>>> ```
>>>
>>> “preg” should be freed by pg_regfree(), given the memory is allocated in TopMemoryContext, this looks like a memory leak.
>>
>> I admit current patch leaves the memory unfreed even after a query
>> finishes. What about adding something like:
>>
>> static void do_pattern_match_end(void)
>> {
>> if (regcache != NULL)
>> pg_regfree(regcache);
>> }
>>
>> and let ExecEndWindowAgg() call it?
>

I’m not sure. But I think if we move the data into WindowAggState, then I guess we don’t have to use TopmemoryContext here.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2025-11-27 07:20:24 Re: IPC/MultixactCreation on the Standby server
Previous Message Michael Paquier 2025-11-27 06:33:39 Re: pgsql: doc: Fix incorrect wording for --file in pg_dump