Quick Links

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	vignesh C <vignesh21(at)gmail(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-03 19:43:50
Message-ID:	CAD21AoBhDBqavSkxr+0GCxom1Q7P7guY5Ees20Wda=YZLFVfCA@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Tue, Jun 3, 2025 at 12:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > (txn->ninvalidations_distr + nmsgs);
> > +
> > + /*
> > + * If the number of invalidation messages is larger than 8MB, it's more
> > + * efficient to invalidate the entire cache rather than processing each
> > + * message individually.
> > + */
> > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> >
> > It's better to define the maximum number of distributed inval messages
> > per transaction as a macro instead of calculating the memory size
> > every time.
>
> Modified
>
> > ---
> > +static void
> > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > + XLogRecPtr lsn, Size nmsgs,
> > + SharedInvalidationMessage *msgs,
> > + ReorderBufferTXN *txn,
> > + bool for_inval)
> >
> > This function is quite confusing to me. For instance,
> > ReorderBufferAddDistributedInvalidations() needs to call this function
> > with for_inval=false in spite of adding inval messages actually. Also,
> > the following condition seems not intuisive but there is no comment:
> >
> > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> >
> > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > can have a function say ReorderBufferQueueInvalidations() where we
> > enqueue the given inval messages as a
> > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > ReorderBufferAddInvalidations() adds inval messages to
> > txn->invalidations and calls that function, while
> > ReorderBufferQueueInvalidations() adds inval messages to
> > txn->distributed_ivnalidations and calls that function if the array is
> > not full.
>
> Modified
>
> > BTW if we need to invalidate all accumulated caches at the end of
> > transaction replay anyway, we don't need to add inval messages to
> > txn->invalidations once txn->distributed_invalidations gets full?
>
> yes, no need to add invalidation messages to txn->invalidation once
> RBTXN_INVAL_ALL_CACHE is set. This is handled now.
>
> The attached v9 version patch has the changes for the same.

Thank you for updating the patch. Here are review comments on v9 patch:

+/*
+ * Maximum number of distributed invalidation messages per transaction.
+ * Each message is ~16 bytes, this allows up to 8 MB of invalidation
+ * message data.
+ */
+#define MAX_DISTR_INVAL_MSG_PER_TXN 524288

The size of SharedInvalidationMessage could change in the future so we
should calculate it at compile time.

---
+ /*
+ * If the complete cache will be invalidated, we don't need to accumulate
+ * the invalidations.
+ */
+ if (!rbtxn_inval_all_cache(txn))
+ ReorderBufferAccumulateInvalidations(&txn->ninvalidations,
+ &txn->invalidations, nmsgs, msgs);

We need to explain why we don't check the number of invalidation
messages for txn->invalidations and mark it as inval-all-cache, unlike
ReorderBufferAddDistributedInvalidations().

---
+ /*
+ * If the number of invalidation messages is high, performing a full cache
+ * invalidation is more efficient than handling each message separately.
+ */
+ if (((nmsgs + txn->ninvalidations_distributed) >
MAX_DISTR_INVAL_MSG_PER_TXN) ||
+ rbtxn_inval_all_cache(txn))
{
- txn->invalidations = (SharedInvalidationMessage *)
- repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
- (txn->ninvalidations + nmsgs));
+ txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;

I think we don't need to mark the transaction as RBTXN_INVAL_ALL_CACHE
again. I'd rewrite the logic as follows:

if (txn->ninvalidations_distributed + nmsgs >= MAX_DISTR_INVAL_MSG_PER_TXN)
{
/* mark the txn as inval-all-cache */
....
/* free the accumulated inval msgs */
....
}

if (!rbtxn_inval_all_cache(txn))
ReorderBufferAccumulateInvalidations(...);

---
- ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
- ninvalidations, msgs);
+ ReorderBufferAddDistributedInvalidations(builder->reorder,
+ txn->xid, lsn,
+ ninvalidations, msgs);

I think we need some comments here to explain why we need to
distribute only inval messages coming from the current transaction.

---
+/* Should the complete cache be invalidated? */
+#define rbtxn_inval_all_cache(txn) \
+( \
+ ((txn)->txn_flags & RBTXN_INVAL_ALL_CACHE) != 0 \
+)

I find that if we rename the flag to something like
RBTXN_INVAL_OVERFLOWED, it would explain the state of the transaction
clearer.

---
Can we have a reasonable test case that covers the inval message overflow cases?

I've attached a patch for some changes and adding more comments (note
that it still has XXX comments). Please include these changes that you
agreed with in the next version patch.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

Attachment	Content-Type	Size
change_v9_masahiko.patch	application/octet-stream	8.8 KB

In response to

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-06-03 07:07:14 from vignesh C

Responses

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-06-04 06:47:55 from vignesh C
RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-06-04 09:20:51 from Hayato Kuroda (Fujitsu)

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Michael Paquier	2025-06-04 00:12:17	Re: BUG #18944: Assertion Failure in psql with idle_session_timeout Set
Previous Message	Tom Lane	2025-06-03 17:15:33	Re: BUG #18943: Return value of a function 'xmlBufferCreate' is dereferenced at xpath.c:177 without checking for NUL