Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-06-04 06:47:55
Message-ID: CALDaNm1MMafe_Xr3RFc0t3ds82W4CrR4=Ewi1Vh3uscpZ-rW0Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> On Tue, Jun 3, 2025 at 12:07 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> >
> > On Mon, 2 Jun 2025 at 22:49, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Thank you for updating the patch. Here are some review comments:
> > >
> > > + req_mem_size = sizeof(SharedInvalidationMessage) *
> > > (txn->ninvalidations_distr + nmsgs);
> > > +
> > > + /*
> > > + * If the number of invalidation messages is larger than 8MB, it's more
> > > + * efficient to invalidate the entire cache rather than processing each
> > > + * message individually.
> > > + */
> > > + if (req_mem_size > (8 * 1024 * 1024) || rbtxn_inval_all_cache(txn))
> > >
> > > It's better to define the maximum number of distributed inval messages
> > > per transaction as a macro instead of calculating the memory size
> > > every time.
> >
> > Modified
> >
> > > ---
> > > +static void
> > > +ReorderBufferAddInvalidationsCommon(ReorderBuffer *rb, TransactionId xid,
> > > + XLogRecPtr lsn, Size nmsgs,
> > > + SharedInvalidationMessage *msgs,
> > > + ReorderBufferTXN *txn,
> > > + bool for_inval)
> > >
> > > This function is quite confusing to me. For instance,
> > > ReorderBufferAddDistributedInvalidations() needs to call this function
> > > with for_inval=false in spite of adding inval messages actually. Also,
> > > the following condition seems not intuisive but there is no comment:
> > >
> > > if (!for_inval || (for_inval && !rbtxn_inval_all_cache(txn)))
> > >
> > > Instead of having ReorderBufferAddInvalidationsCommon(), I think we
> > > can have a function say ReorderBufferQueueInvalidations() where we
> > > enqueue the given inval messages as a
> > > REORDER_BUFFER_CHANGE_INVALIDATION change.
> > > ReorderBufferAddInvalidations() adds inval messages to
> > > txn->invalidations and calls that function, while
> > > ReorderBufferQueueInvalidations() adds inval messages to
> > > txn->distributed_ivnalidations and calls that function if the array is
> > > not full.
> >
> > Modified
> >
> > > BTW if we need to invalidate all accumulated caches at the end of
> > > transaction replay anyway, we don't need to add inval messages to
> > > txn->invalidations once txn->distributed_invalidations gets full?
> >
> > yes, no need to add invalidation messages to txn->invalidation once
> > RBTXN_INVAL_ALL_CACHE is set. This is handled now.
> >
> > The attached v9 version patch has the changes for the same.
>
> Thank you for updating the patch. Here are review comments on v9 patch:
>
> +/*
> + * Maximum number of distributed invalidation messages per transaction.
> + * Each message is ~16 bytes, this allows up to 8 MB of invalidation
> + * message data.
> + */
> +#define MAX_DISTR_INVAL_MSG_PER_TXN 524288
>
> The size of SharedInvalidationMessage could change in the future so we
> should calculate it at compile time.

Modified

> ---
> + /*
> + * If the complete cache will be invalidated, we don't need to accumulate
> + * the invalidations.
> + */
> + if (!rbtxn_inval_all_cache(txn))
> + ReorderBufferAccumulateInvalidations(&txn->ninvalidations,
> + &txn->invalidations, nmsgs, msgs);
>
> We need to explain why we don't check the number of invalidation
> messages for txn->invalidations and mark it as inval-all-cache, unlike
> ReorderBufferAddDistributedInvalidations().

Added comments

> ---
> + /*
> + * If the number of invalidation messages is high, performing a full cache
> + * invalidation is more efficient than handling each message separately.
> + */
> + if (((nmsgs + txn->ninvalidations_distributed) >
> MAX_DISTR_INVAL_MSG_PER_TXN) ||
> + rbtxn_inval_all_cache(txn))
> {
> - txn->invalidations = (SharedInvalidationMessage *)
> - repalloc(txn->invalidations, sizeof(SharedInvalidationMessage) *
> - (txn->ninvalidations + nmsgs));
> + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
>
> I think we don't need to mark the transaction as RBTXN_INVAL_ALL_CACHE
> again. I'd rewrite the logic as follows:
>
> if (txn->ninvalidations_distributed + nmsgs >= MAX_DISTR_INVAL_MSG_PER_TXN)
> {
> /* mark the txn as inval-all-cache */
> ....
> /* free the accumulated inval msgs */
> ....
> }
>
> if (!rbtxn_inval_all_cache(txn))
> ReorderBufferAccumulateInvalidations(...);

Modified

> ---
> - ReorderBufferAddInvalidations(builder->reorder, txn->xid, lsn,
> - ninvalidations, msgs);
> + ReorderBufferAddDistributedInvalidations(builder->reorder,
> + txn->xid, lsn,
> + ninvalidations, msgs);
>
> I think we need some comments here to explain why we need to
> distribute only inval messages coming from the current transaction.

Added comments

> ---
> +/* Should the complete cache be invalidated? */
> +#define rbtxn_inval_all_cache(txn) \
> +( \
> + ((txn)->txn_flags & RBTXN_INVAL_ALL_CACHE) != 0 \
> +)
>
> I find that if we rename the flag to something like
> RBTXN_INVAL_OVERFLOWED, it would explain the state of the transaction
> clearer.

Modified

> Can we have a reasonable test case that covers the inval message overflow cases?
One of us will work on this and post a separate patch

> I've attached a patch for some changes and adding more comments (note
> that it still has XXX comments). Please include these changes that you
> agreed with in the next version patch.

Thanks for the comments, I merged it.

The attached v10 version patch has the changes for the same.

Regards,
Vignesh

Attachment Content-Type Size
v10-master-0001-Fix-exponential-memory-allocation-issue-i.patch application/octet-stream 13.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Hayato Kuroda (Fujitsu) 2025-06-04 09:20:51 RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Michael Paquier 2025-06-04 00:44:03 Re: BUG #18920: LOAD '$libdir/plugins' no longer works in 18beta1