Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org, Sawada Masahiko <sawada(dot)mshk(at)gmail(dot)com>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-21 11:12:15
Message-ID: CAA4eK1LMgqeT_bPZ3MH-VKvwOqpZyfJmF7knZhu1rqt2Pqsnwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, May 21, 2025 at 11:18 AM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Mon, May 19, 2025 at 8:08 PM Duncan Sands
> <duncan(dot)sands(at)deepbluecap(dot)com> wrote:
> >
> > While it is long, it doesn't seem to merit allocating anything like 1GB of
> > memory. So I'm guessing that postgres is miscalculating the required size somehow.
> >
>
> We fixed a bug in commit 4909b38af0 to distribute invalidation at the
> transaction end to avoid data loss in certain cases, which could cause
> such a problem. I am wondering that even prior to that commit, we
> would eventually end up allocating the required memory for a
> transaction for all the invalidations because of repalloc in
> ReorderBufferAddInvalidations, so why it matter with this commit? One
> possibility is that we need allocations for multiple in-progress
> transactions now.
>

I think the problem here is that when we are distributing
invalidations to a concurrent transaction, in addition to queuing the
invalidations as a change, we also copy the distributed invalidations
along with the original transaction's invalidations via repalloc in
ReorderBufferAddInvalidations. So, when there are many in-progress
transactions, each would try to copy all its accumulated invalidations
to the remaining in-progress transactions. This could lead to such an
increase in allocation request size. However, after queuing the
change, we don't need to copy it along with the original transaction's
invalidations. This is because the copy is only required when we don't
process any changes in cases like ReorderBufferForget(). I have
analyzed all such cases, and my analysis is as follows:

ReorderBufferForget()
------------------------------
It is okay not to perform the invalidations that we got from other
concurrent transactions during ReorderBufferForget. This is because
ReorderBufferForget executes invalidations when we skip the
transaction being decoded, as it is not from a database of interest.
So, we execute only to invalidate shared catalogs (See comment at the
caller of ReorderBufferForget). It is sufficient to execute such
invalidations in the source transaction only because the transaction
being skipped wouldn't have loaded anything in the shared catalog.

ReorderBufferAbort()
-----------------------------
ReorderBufferAbort() process invalidation when it has already streamed
some changes. Whenever it would have streamed the change, it would
have processed the concurrent transactions' invalidation messages that
happened before the statement that led to streaming. That should be
sufficient for us.

Consider the following variant of the original case that required the
distribution of invalidations:
1) S1: CREATE TABLE d(data text not null);
2) S1: INSERT INTO d VALUES('d1');
3) S2: BEGIN; INSERT INTO d VALUES('d2');
4) S1: ALTER PUBLICATION pb ADD TABLE d;
5) S2: INSERT INTO unrelated_tab VALUES(1);
6) S2: ROLLBACK;
7) S2: INSERT INTO d VALUES('d3');
8) S1: INSERT INTO d VALUES('d4');

The problem with the sequence is that the insert from 3) could be
decoded *after* 4) in step 5) due to streaming, and that to decode the
insert (which happened before the ALTER) the catalog snapshot and
cache state is from *before* the ALTER TABLE. Because the transaction
started in 3) doesn't modify any catalogs, no invalidations are
executed after decoding it. The result could be that the cache looks
like it did at 3), not like after 4). However, this won't create a
problem because while streaming at 5), we would execute invalidation
from S-1 due to the change added via message
REORDER_BUFFER_CHANGE_INVALIDATION in ReorderBufferAddInvalidations.

ReorderBufferInvalidate
--------------------------------
The reason is the same as ReorderBufferForget(), as it executes
invalidations for the same reason, but with a different function to
avoid the cleanup of the buffer at the end.

XLOG_XACT_INVALIDATIONS
-------------------------------------------
While processing XLOG_XACT_INVALIDATIONS, we don't need invalidations
accumulated from other xacts because this is a special case to execute
invalidations from a particular command (DDL) in a transaction. It
won't build any cache, so it can't create any invalid state.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Duncan Sands 2025-05-21 11:30:58 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Laurenz Albe 2025-05-21 06:17:53 Re: BUG #18936: Trigger enable users to modify the tables which he doesn't have privilege