Invalid pointer access in logical decoding after error

From: vignesh C <vignesh21(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Invalid pointer access in logical decoding after error
Date: 2025-07-02 06:42:17
Message-ID: CALDaNm0x-aCehgt8Bevs2cm=uhmwS28MvbYq1=s2Ekf0aDPkOA@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I encountered an invalid pointer access issue. Below are the steps to
reproduce the issue:
-- Create table
CREATE TABLE t1(c1 int, c2 int);

-- Create publications with each publication selecting a different column
CREATE PUBLICATION pub1 for TABLE t1(c1);
CREATE PUBLICATION pub2 for TABLE t1(c2);

-- Create slot
SELECT * FROM pg_create_logical_replication_slot('test', 'pgoutput');

-- Insert couple of records
INSERT INTO t1 VALUES(1,1);
INSERT INTO t1 VALUES(2,2);

-- Execute slot_get_changes which will throw an error because of
different column lists
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('test',
NULL, NULL, 'proto_version', '4', 'publication_names', 'pub1,pub2');
ERROR: cannot use different column lists for table "public.t1" in
different publications
CONTEXT: slot "test", output plugin "pgoutput", in the change
callback, associated LSN 0/14C3C30

-- The second call simulates an issue where we try to free an invalid pointer
postgres=# SELECT * FROM pg_logical_slot_get_binary_changes('test',
NULL, NULL, 'proto_version', '4', 'publication_names', 'pub1,pub2');
ERROR: pfree called with invalid pointer 0x58983541e6b8 (header
0x6563617073656d61)
CONTEXT: slot "test", output plugin "pgoutput", in the change
callback, associated LSN 0/14C3C30

The error occurs because entry->columns is allocated in the entry
private context (entry->entry_cxt) by pub_collist_to_bitmapset(). This
context is a child of the PortalContext, which is cleared after an
error via: AbortTransaction -> AtAbort_Portals ->
MemoryContextDeleteChildren -> MemoryContextDelete ->
MemoryContextDeleteOnly
As a result, the memory backing entry->columns is freed, but the
RelationSyncCache which resides in CacheMemoryContext and thus
survives the error still holds a dangling pointer to this freed
memory, causing it to pfree an invalid pointer.
In the normal (positive) execution flow, pgoutput_shutdown() is called
to clean up the RelationSyncCache. This happens via:
FreeDecodingContext -> shutdown_cb_wrapper -> pgoutput_shutdown
But this is not called in case of an error case. To handle this case
safely, I suggest calling FreeDecodingContext in the PG_CATCH block to
ensure pgoutput_shutdown is invoked and the stale cache is cleared
appropriately. Attached patch has the changes for the same.
Thoughts?

Regards,
Vignesh

Attachment Content-Type Size
v1-0001-Fix-referencing-invalid-pointer-in-logical-decodi.patch application/octet-stream 1.6 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Dean Rasheed 2025-07-02 06:48:31 Re: Allow the "operand" input of width_bucket() to be NaN
Previous Message Bertrand Drouvot 2025-07-02 06:39:25 Re: Add os_page_num to pg_buffercache