Re: BUG #19366: heap-use-after-free in pgaio_io_reclaim() detected with RELCACHE_FORCE_RELEASE

From: Andres Freund <andres(at)anarazel(dot)de>
To: exclusion(at)gmail(dot)com, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #19366: heap-use-after-free in pgaio_io_reclaim() detected with RELCACHE_FORCE_RELEASE
Date: 2026-01-14 16:45:30
Message-ID: an3xpqvvga47xpazihhdijpsuor4offvt2shctqdfwkwh7liye@k2cqhszxqwva
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi,

On 2025-12-29 06:00:01 +0000, PG Bug reporting form wrote:
> The following bug has been logged on the website:
>
> Bug reference: 19366
> Logged by: Alexander Lakhin
> Email address: exclusion(at)gmail(dot)com
> PostgreSQL version: 18.1
> Operating system: Ubuntu 24.04
> Description:

Alexander pinged me about this - thanks, I had missed this thread!

> =================================================================
> ==1414701==ERROR: AddressSanitizer: heap-use-after-free on address
> 0x52d000160a10 at pc 0x6315765530f4 bp 0x7fff3a67b6d0 sp 0x7fff3a67b6c0
> WRITE of size 8 at 0x52d000160a10 thread T0
> #0 0x6315765530f3 in pgaio_io_reclaim
> .../src/backend/storage/aio/aio.c:698
> #1 0x6315765523dd in pgaio_io_process_completion
> [...]
> #5 0x6315765568ad in pgaio_closing_fd
> .../src/backend/storage/aio/aio.c:1279
> #6 0x6315765bf4dc in FileClose .../src/backend/storage/file/fd.c:1975
> #7 0x6315766d8285 in mdclose .../src/backend/storage/smgr/md.c:726
> #8 0x6315766e3264 in smgrrelease .../src/backend/storage/smgr/smgr.c:356
> #9 0x6315766e34af in smgrclose .../src/backend/storage/smgr/smgr.c:376
> #10 0x631576ee2edb in RelationCloseSmgr
> ../../../../src/include/utils/rel.h:597
> #11 0x631576efae6e in RelationInvalidateRelation
> .../src/backend/utils/cache/relcache.c:2527
> #12 0x631576efb3f8 in RelationClearRelation
> .../src/backend/utils/cache/relcache.c:2560
> #13 0x631576ef7582 in RelationCloseCleanup
> .../src/backend/utils/cache/relcache.c:2251
> #14 0x631576f247bf in ResOwnerReleaseRelation
> [...]
> #18 0x63157709ace5 in ResourceOwnerRelease
> .../src/backend/utils/resowner/resowner.c:661
> #19 0x631574fd4ac1 in AbortTransaction
> (.../tmp_install/usr/local/pgsql/bin/postgres+0x3437cf4) (BuildId:
> fb9da6221fd034ea4004b34de480b536444e54b6)

The problem is that for reasons I can't quite fathom, relcache cleanup happens
way earlier in resowner cleanup than I had realized. The resowner cleanup then
can trigger waiting for the IO as part of closing file descriptors, which in
turn will reference memory that was freed below AtAbort_Portals().

Importantly, at that point we haven't yet done this bit from
ResouceOwnerReleaseInternal():

while (!dlist_is_empty(&owner->aio_handles))
{
dlist_node *node = dlist_head_node(&owner->aio_handles);

pgaio_io_release_resowner(node, !isCommit);
}

which would have removed the reference to the local memory.

Besides that relcache cleanup happens early, I'm also somewhat surprised at
AtAbort_Portals() happen so early and that AtAbort_Portals() frees memory.
Note that

/*
* Abort processing for portals.
*
* At this point we run the cleanup hook if present, but we can't release the
* portal's memory until the cleanup call.
*/
void
AtAbort_Portals(void)

says that memory won't be released. Unfortunately, while that's kinda true, we
*do* already clean up some of the memory:
/*
* Although we can't delete the portal data structure proper, we can
* release any memory in subsidiary contexts, such as executor state.
* The cleanup hook was the last thing that might have needed data
* there. But leave active portals alone.
*/
if (portal->status != PORTAL_ACTIVE)
MemoryContextDeleteChildren(portal->portalContext);

Not yet quite sure how to best fix this.

Greetings,

Andres Freund

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Pierre Forstmann 2026-01-14 18:15:17 Re: BUG #19369: Not documented that io_uring on kernel versions between 5.1 and below 5.6 does not work
Previous Message Amit Langote 2026-01-14 13:38:29 Re: BUG #19099: Conditional DELETE from partitioned table with non-updatable partition raises internal error