From: | "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
Subject: | RE: Invalid pointer access in logical decoding after error |
Date: | 2025-07-02 07:51:03 |
Message-ID: | TYAPR01MB57246E5FE55940E7F27DDFEC9440A@TYAPR01MB5724.jpnprd01.prod.outlook.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Jul 2, 2025 at 2:42 PM vignesh C wrote:
>
> Hi,
>
> I encountered an invalid pointer access issue. Below are the steps to
> reproduce the issue:
...
> The error occurs because entry->columns is allocated in the entry
> private context (entry->entry_cxt) by pub_collist_to_bitmapset(). This
> context is a child of the PortalContext, which is cleared after an
> error via: AbortTransaction
> -> AtAbort_Portals ->
> MemoryContextDeleteChildren -> MemoryContextDelete ->
> MemoryContextDeleteOnly
> As a result, the memory backing entry->columns is freed, but the
> RelationSyncCache which resides in CacheMemoryContext and thus
> survives the error still holds a dangling pointer to this freed
> memory, causing it to pfree an invalid pointer.
> In the normal (positive) execution flow, pgoutput_shutdown() is called
> to clean up the RelationSyncCache. This happens via:
> FreeDecodingContext -> shutdown_cb_wrapper -> pgoutput_shutdown But
> this is not called in case of an error case. To handle this case
> safely, I suggest calling FreeDecodingContext in the PG_CATCH block to
> ensure pgoutput_shutdown is invoked and the stale cache is cleared appropriately.
> Attached patch has the changes for the same.
> Thoughts?
Thank you for reporting the issue and providing a fix.
I recall that we identified this general issue with the hash table in pgoutput
in other threads as well [1]. The basic consensus [2] is that calling
FreeDecodingContext() within PG_CATCH is not ideal, as this function includes
user code, increasing the risk of encountering another error within PG_CATCH.
This scenario could prevent execution of subsequent code to invalidate syscache
entries, which is problematic.
I think a better fix could be to introduce a memory context reset callback(on
data->cachectx) and perform the actions of pgoutput_shutdown() within it.
[1] https://www.postgresql.org/message-id/OS0PR01MB57167C62D7DA4A8EBBC92B0A941BA@OS0PR01MB5716.jpnprd01.prod.outlook.com
[2] https://www.postgresql.org/message-id/20230820210545.plmlk3rnxxgkokkj%40awork3.anarazel.de
Best Regards,
Hou zj
From | Date | Subject | |
---|---|---|---|
Next Message | Daniel Gustafsson | 2025-07-02 07:55:52 | Re: Fix inconsistency in the pg_buffercache documentation |
Previous Message | Daniel Gustafsson | 2025-07-02 07:49:05 | Re: Explicitly enable meson features in CI |