Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: vignesh C <vignesh21(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-06-02 03:13:37
Message-ID: CALDaNm0=JNjLqFzcAhujHLH1XQWcu+Jh_WJ6CCuqpjNb-fLnzg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Sat, 31 May 2025 at 10:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> >
> > Thank you for updating the patch. Here are some review comments:
> >
> > ---
> > + /*
> > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > + * this must be done unconditionally because the processing might
> > + * fail before we reach invalidation messages.
> > + */
> > + if (rbtxn_inval_all_cache(txn))
> > + InvalidateSystemCaches();
> > + else
> > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > +
> > txn->distributed_invalidations);
> > +
> >
> > If we don't need to execute the distributed inval message in an error
> > path other than detecting concurrent abort, we should describe the
> > reason.
> >
>
> The concurrent abort handling is done for streaming and prepared
> transactions, where we send the transaction changes to the client
> before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
> among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
> prepared transaction case. For ROLLBACK, we will never perform any
> changes from the current transaction, so we don't need distributed
> invalidations to be executed. For COMMIT, if we encounter any errors
> while processing changes (this is when we reach the ERROR path, which
> is not a concurrent abort), then we will reprocess all changes and, at
> the end, execute both the current transaction and distributed
> invalidations. Now, one possibility is that if, after ERROR, the
> caller does slot_advance to skip the ERROR, then we will probably miss
> executing the distributed invalidations, leading to data loss
> afterwards. If the above theory is correct, then it is better to
> execute distributed invalidation even in non-concurrent-abort cases in
> the ERROR path.

One possible reason this scenario may not occur is that
pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle
exceptions, during which it calls InvalidateSystemCaches to clear the
system cache. Because of this, I believe the scenario might not
actually happen.
@Sawada-san / others — Are there any other cases where this could still occur?

Regards,
Vignesh

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Michael Paquier 2025-06-02 03:17:18 Re: Standby server with cascade logical replication could not be properly stopped under load
Previous Message PG Bug reporting form 2025-06-01 19:05:25 BUG #18943: Return value of a function 'xmlBufferCreate' is dereferenced at xpath.c:177 without checking for NUL