RE: Invalid pointer access in logical decoding after error

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: RE: Invalid pointer access in logical decoding after error
Date: 2025-07-02 07:51:03
Message-ID: TYAPR01MB57246E5FE55940E7F27DDFEC9440A@TYAPR01MB5724.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jul 2, 2025 at 2:42 PM vignesh C wrote:

>
> Hi,
>
> I encountered an invalid pointer access issue. Below are the steps to
> reproduce the issue:
...
> The error occurs because entry->columns is allocated in the entry
> private context (entry->entry_cxt) by pub_collist_to_bitmapset(). This
> context is a child of the PortalContext, which is cleared after an
> error via: AbortTransaction
> -> AtAbort_Portals ->
> MemoryContextDeleteChildren -> MemoryContextDelete ->
> MemoryContextDeleteOnly
> As a result, the memory backing entry->columns is freed, but the
> RelationSyncCache which resides in CacheMemoryContext and thus
> survives the error still holds a dangling pointer to this freed
> memory, causing it to pfree an invalid pointer.
> In the normal (positive) execution flow, pgoutput_shutdown() is called
> to clean up the RelationSyncCache. This happens via:
> FreeDecodingContext -> shutdown_cb_wrapper -> pgoutput_shutdown But
> this is not called in case of an error case. To handle this case
> safely, I suggest calling FreeDecodingContext in the PG_CATCH block to
> ensure pgoutput_shutdown is invoked and the stale cache is cleared appropriately.
> Attached patch has the changes for the same.
> Thoughts?

Thank you for reporting the issue and providing a fix.

I recall that we identified this general issue with the hash table in pgoutput
in other threads as well [1]. The basic consensus [2] is that calling
FreeDecodingContext() within PG_CATCH is not ideal, as this function includes
user code, increasing the risk of encountering another error within PG_CATCH.
This scenario could prevent execution of subsequent code to invalidate syscache
entries, which is problematic.

I think a better fix could be to introduce a memory context reset callback(on
data->cachectx) and perform the actions of pgoutput_shutdown() within it.

[1] https://www.postgresql.org/message-id/OS0PR01MB57167C62D7DA4A8EBBC92B0A941BA@OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/20230820210545.plmlk3rnxxgkokkj%40awork3.anarazel.de

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-07-02 07:55:52 Re: Fix inconsistency in the pg_buffercache documentation
Previous Message Daniel Gustafsson 2025-07-02 07:49:05 Re: Explicitly enable meson features in CI