Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-06-05 18:49:42
Message-ID: CAD21AoBaiMiAMLF-daEyB43hLbWA6fMmWWToGDMyp9V3kp149w@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Jun 4, 2025 at 11:20 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Thu, Jun 5, 2025 at 3:19 AM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > On Tue, Jun 3, 2025 at 11:48 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Wed, 4 Jun 2025 at 01:14, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. I have some comments and questions:
> >
> > In ReorderBufferAbort():
> >
> > /*
> > * We might have decoded changes for this transaction that could load
> > * the cache as per the current transaction's view (consider DDL's
> > * happened in this transaction). We don't want the decoding of future
> > * transactions to use those cache entries so execute invalidations.
> > */
> > if (txn->ninvalidations > 0)
> > ReorderBufferImmediateInvalidation(rb, txn->ninvalidations,
> > txn->invalidations);
> >
> > I think that if the txn->invalidations_distributed is overflowed, we
> > would miss executing the txn->invalidations here. Probably the same is
> > true for ReorderBufferForget() and ReorderBufferInvalidate().
> >
>
> This is because of the following check "if
> (!rbtxn_inval_overflowed(txn))" in function
> ReorderBufferAddInvalidations(). What is the need of such a check in
> this function? We don't need to execute distributed invalidations in
> cases like ReorderBufferForget() when we haven't decoded any changes.

>
> > ---
> > I'd like to make it clear again which case we need to execute
> > txn->invalidations as well as txn->invalidations_distributed (like in
> > ReorderBufferProcessTXN()) and which case we need to execute only
> > txn->invalidations (like in ReorderBufferForget() and
> > ReorderBufferAbort()). I think it might be worth putting some comments
> > about overall strategy somewhere.
> >
> > ---
> > BTW for back branches, a simple fix without ABI breakage would be to
> > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > txn->invalidations. That is, we accumulate inval messages both coming
> > from the current transaction and distributed by other transactions but
> > once the size reaches the threshold we invalidate all caches. Is it
> > worth considering for back branches?
> >
>
> It should work and is worth considering. The main concern would be
> that it will hit sooner than we expect in the field, seeing the recent
> reports. So, such a change has the potential to degrade the
> performance. I feel that the number of people impacted due to
> performance would be more than the number of people impacted due to
> such an ABI change (adding the new members at the end of
> ReorderBufferTXN).

That's a fair point. I initially assumed that DDLs were not executed
often in practice, but analyzing this bug has made me realize this
assumption was misguided.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Masahiko Sawada 2025-06-05 19:21:21 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Masahiko Sawada 2025-06-05 17:43:25 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5