Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To: vignesh C <vignesh21(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-30 17:29:34
Message-ID: CAD21AoCPJS3=SY28X5X_sfKzMA8PU3y0nm16ReyboFdX8=gRfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, May 30, 2025 at 5:15 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Fri, 30 May 2025 at 10:12, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 30, 2025 at 9:31 AM vignesh C <vignesh21(at)gmail(dot)com> wrote:
> > >
> > > On Thu, 29 May 2025 at 22:57, Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > > >
> > > > > I agree that chances are much lower than current if txn->invalidations
> > > > > doesn't contain invalidations from other transactions, but it is not
> > > > > clear what exactly you are trying to advocate by it. Are you trying to
> > > > > advocate that we should maintain a member similar to txn->invalidation
> > > > > (say txn->distributed_invals) instead of a queue?
> > > >
> > > > Yes, because I guess it's much simpler. I think it would not be a good
> > > > idea to introduce a new concept of accounting the memory usage of the
> > > > distributed inval messages too and serializing them, at least on back
> > > > branches. I think that In case where the txn->distriubted_inval is
> > > > about to overflow (not has to be 1GB) we can invalidate all caches
> > > > instread.
> > >
> >
> > I agree that it would be simpler, and to avoid invalid memory
> > allocation requests even for rare cases, we can have the backup logic
> > to invalidate all caches.
> >
> > > To identify overflow scenarios, I’m considering the following options:
> > > a) Introduce a new txn_flags value, such as RBTXN_INVAL_ALL_CACHE, to
> > > explicitly mark transactions that require full cache invalidation.
> > > b) Add a dedicated parameter to indicate an overflow scenario.
> > > c) setting the newly added nentries_distr to -1, to indicate an
> > > overflow scenario.
> > >
> > > Do you have any preference or thoughts on which of these approaches
> > > would be cleaner?
> > >
> >
> > I would prefer (a) as that is an explicit way to indicate that we need
> > to invalidate all caches. But let us see if Sawada-san has something
> > else in mind.
>
> The attached v6 version patch has the changes for the same.
> Thoughts?

Thank you for updating the patch. Here are some review comments:

@@ -3439,9 +3464,27 @@ ReorderBufferAddInvalidations(ReorderBuffer
*rb, TransactionId xid,
XLogRecPtr lsn, Size nmsgs,
SharedInvalidationMessage *msgs)
{
- ReorderBufferTXN *txn;
+ ReorderBufferAddInvalidationsExtended(rb, xid, lsn, nmsgs, msgs, false);
+}

If the patch is the changes for master do we need to have an extended
version of ReorderBufferAddInvalidation()?

---
+ /*
+ * Make sure there's no cache pollution. Unlike the PG_TRY part,
+ * this must be done unconditionally because the processing might
+ * fail before we reach invalidation messages.
+ */
+ if (rbtxn_inval_all_cache(txn))
+ InvalidateSystemCaches();
+ else
+ ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
+
txn->distributed_invalidations);
+

If we don't need to execute the distributed inval message in an error
path other than detecting concurrent abort, we should describe the
reason.

---
Given that we don't account the memory usage of both
txn->invalidations and txn->distributed_invalidations, probably we can
have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.

---
+ if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
+ rbtxn_inval_all_cache(txn))
{
- txn->ninvalidations = nmsgs;
- txn->invalidations = (SharedInvalidationMessage *)
- palloc(sizeof(SharedInvalidationMessage) * nmsgs);
- memcpy(txn->invalidations, msgs,
- sizeof(SharedInvalidationMessage) * nmsgs);
+ txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
+
+ if (*invalidations)
+ {
+ pfree(*invalidations);
+ *invalidations = NULL;
+ *ninvalidations = 0;
+ }

RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
inval messages. One question is do we need to care about the overflow
of txn->invalidations as well? If no, does it make sense to have a
separate function like ReorderBufferAddDistributedInvalidtions()
instead of having an extended version of
ReorderBufferAddInvalidations()? Some common routines can also be
declared as a static function if needed.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message PG Bug reporting form 2025-05-30 19:39:43 BUG #18941: PostgreSQL planner does not select a multicolumn btree_gin index under RLS
Previous Message Tom Lane 2025-05-30 16:59:58 Re: [EXT] Re: GSS Auth issue when user member of lots of AD groups