Quick Links

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From:	Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
To:	"Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>
Cc:	Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, vignesh C <vignesh21(at)gmail(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject:	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date:	2025-06-05 19:21:21
Message-ID:	CAD21AoCjHVR28__2TAuM5BZfgHbyYD9X=4nof3e+NdTVhg95Yw@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

On Thu, Jun 5, 2025 at 4:07 AM Hayato Kuroda (Fujitsu)
<kuroda(dot)hayato(at)fujitsu(dot)com> wrote:
>
> Dear Amit,
>
> > > ---
> > > I'd like to make it clear again which case we need to execute
> > > txn->invalidations as well as txn->invalidations_distributed (like in
> > > ReorderBufferProcessTXN()) and which case we need to execute only
> > > txn->invalidations (like in ReorderBufferForget() and
> > > ReorderBufferAbort()). I think it might be worth putting some comments
> > > about overall strategy somewhere.
> > >
> > > ---
> > > BTW for back branches, a simple fix without ABI breakage would be to
> > > introduce the RBTXN_INVAL_OVERFLOWED flag to limit the size of
> > > txn->invalidations. That is, we accumulate inval messages both coming
> > > from the current transaction and distributed by other transactions but
> > > once the size reaches the threshold we invalidate all caches. Is it
> > > worth considering for back branches?
> > >
> >
> > It should work and is worth considering. The main concern would be
> > that it will hit sooner than we expect in the field, seeing the recent
> > reports. So, such a change has the potential to degrade the
> > performance. I feel that the number of people impacted due to
> > performance would be more than the number of people impacted due to
> > such an ABI change (adding the new members at the end of
> > ReorderBufferTXN). However, if we think we want to go safe w.r.t
> > extensions that can rely on the sizeof ReorderBufferTXN then your
> > proposal makes sense.
>
> While considering the approach, I found a doubtful point. Consider the below
> workload:
>
> 0. S1: CREATE TABLE d(data text not null);
> 1. S1: BEGIN;
> 2. S1: INSERT INTO d VALUES ('d1')
> 3. S2: BEGIN;
> 4. S2: INSERT INTO d VALUES ('d2')
> 5. S1: ALTER PUBLICATION pb ADD TABLE d;
> 6. S1: ... lots of DDLs so overflow happens
> 7. S1: COMMIT;
> 8. S2: INSERT INTO d VALUES ('d3');
> 9. S2: COMMIT;
> 10. S2: INSERT INTO d VALUES ('d4');
>
> In this case, the inval message generated by step 5 is discarded at step 6. No
> invalidation messages are distributed in the SnapBuildDistributeSnapshotAndInval().
> While decoding S2, relcache cannot be discarded and tuples d3 and d4 won't be
> replicated. Do you think this can happen?

I think that once the S1's inval messages got overflowed, we should
mark other transactions as overflowed instead of distributing inval
messages.

Regards,

--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com

In response to

RE: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-06-05 11:07:22 from Hayato Kuroda (Fujitsu)

Responses

Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 at 2025-06-06 03:21:54 from Amit Kapila

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	Masahiko Sawada	2025-06-05 21:59:12	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message	Masahiko Sawada	2025-06-05 18:49:42	Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5