Re: Performance issues with v18 SQL-language-function changes

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Alexander Lakhin <exclusion(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alexander Pyhalov <a(dot)pyhalov(at)postgrespro(dot)ru>, Bruce Momjian <bruce(at)momjian(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Subject: Re: Performance issues with v18 SQL-language-function changes
Date: 2025-05-01 00:18:01
Message-ID: 1469502.1746058681@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Alexander Lakhin <exclusion(at)gmail(dot)com> writes:
> Sorry if I've chosen the wrong thread, but starting from 0313c5dc6,
> the following script:
> CREATE TYPE aggtype AS (a int, b text);
> CREATE FUNCTION aggfns_trans(aggtype[], integer, text) RETURNS aggtype[] LANGUAGE sql AS 'SELECT
> array_append($1,ROW($2,$3)::aggtype)';
> CREATE AGGREGATE aggfns(integer, text) (SFUNC = public.aggfns_trans, STYPE = public.aggtype[], INITCOND = '{}');

> CREATE TABLE t(i int,  k int);
> INSERT INTO t SELECT 1, 2 FROM generate_series(1, 4000);

> SET statement_timeout = '10s';
> SELECT aggfns(i, repeat('x', 8192)) OVER (PARTITION BY i) FROM t;

> crashes the server for me like this:
> corrupted size vs. prev_size while consolidating

Hmm. What seems to be going on here is that once the aggfns_trans()
result gets large enough that the SQL-function-result tuplestore
decides to spill to disk, when we pull the result tuple back out
of the tuplestore with tuplestore_gettupleslot we end up with the
jf_resultSlot holding a should-free tuple pointer that points into
the tuplestore's storage. After tuplestore_clear that is a dangling
pointer, and the next use of the jf_resultSlot fails while trying to
free the tuple.

So the attached fixes it for me, but I'm still mighty confused
because I don't understand why it didn't fail in the old code.
This logic doesn't seem noticeably different from before, and
there's even a very old comment (in the SRF path) alleging that

/* NB: this might delete the slot's content, but we don't care */

Now we *do* care, but what changed?

(As an aside, seems like tuplestore is leaving money on the table,
because there's hardly any point in spilling to disk when it never
holds more than one tuple. But that's not something to change now.)

regards, tom lane

Attachment Content-Type Size
fix-dangling-tupleslot-pointer.patch text/x-diff 1.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2025-05-01 00:29:13 Re: queryId constant squashing does not support prepared statements
Previous Message David Rowley 2025-05-01 00:11:54 Re: Introduce some randomness to autovacuum