Re: BUG #19438: segfault with temp_file_limit inside cursor

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: kuzmin(dot)db4(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, David Rowley <dgrowleyml(at)gmail(dot)com>
Subject: Re: BUG #19438: segfault with temp_file_limit inside cursor
Date: 2026-03-27 01:02:51
Message-ID: 1106026.1774573371@sss.pgh.pa.us
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> I experimented with setting temp_file_limit within a cursor and discovered a
> segmentation fault under certain circumstances.
> The issue exist in the current minors of 14 and 15 (14.22 and 15.17), but I
> was unable to reproduce it in version 16 or higher.

> To reproduce, simply run the following code.

> begin;
> declare cur1 cursor for select c, c c2 from generate_series(0, 1000000)
> x(c) order by c;
> \o /dev/null
> fetch all from cur1;
> set temp_file_limit TO '1MB';
> fetch backward all from cur1;
> rollback ;

Many thanks for the report! I confirm your results that this fails
in v14 and v15 but not later branches. However, I'm quite mystified
why v16 and v17 don't fail. The attached patch fixes it in v15,
and I think we need to apply it to all branches.

What is happening is that the last FETCH is trying to fill the
holdStore of the Portal holding the FETCH execution, and we soon run
out of work_mem and start dumping the tuples into a temp file. While
doing that, we run up against the temp_file_limit and fd.c throws an
error. This leaves the Portal's holdStore in a corrupted state, as a
result of the oversight described and fixed in the attached patch:
we've already deleted some tuples from its in-memory array, but the
tuplestore's state doesn't reflect that. Then during transaction
abort we must clean up the tuplestore (since it's part of a long-lived
data structure), and tuplestore_end therefore tries to delete all the
tuples in the in-memory array. Double free. Kaboom.

At least, that's what happens in v15 and (probably) all prior branches
for a long way back. v18 and later fortuitously avoid the failure
because they got rid of tuplestore_end's retail tuple deletion loop
in favor of a memory context deletion (cf 590b045c3). v16 and v17
*should* fail, but somehow they don't, and I don't understand why not.
I bisected it and determined that the failures stop with

c6e0fe1f2a08505544c410f613839664eea9eb21 is the first new commit
commit c6e0fe1f2a08505544c410f613839664eea9eb21
Author: David Rowley <drowley(at)postgresql(dot)org>
Date: Mon Aug 29 17:15:00 2022 +1200

Improve performance of and reduce overheads of memory management

which makes no sense whatsoever. Somehow, we are not crashing on a
double free with the new memory chunk header infrastructure. David,
have you any idea why not?

Even though no failure manifests with this example in v16+, we are
clearly at risk by leaving corrupted tuplestore state behind,
so I think the attached has to go into all branches.

regards, tom lane

Attachment Content-Type Size
0001-fix-tuplestore-corruption-15.patch text/x-diff 908 bytes

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tender Wang 2026-03-27 01:19:29 Re: BUG #19435: Error: "No relation entry for relid 2" Triggered by Complex Join with Self-Referencing Tables
Previous Message Michael Paquier 2026-03-26 23:44:40 Re: Segmentation fault in PostgreSQL 17.7 during REINDEX TABLE CONCURRENTLY