From: | Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> |
---|---|
To: | vignesh C <vignesh21(at)gmail(dot)com> |
Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org> |
Subject: | Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5 |
Date: | 2025-06-02 16:51:03 |
Message-ID: | CAD21AoAe08b6smo7aq6zUUSANXpcfH2JVhU54LDyaBo59Y0DHg@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-bugs |
On Sun, Jun 1, 2025 at 8:13 PM vignesh C <vignesh21(at)gmail(dot)com> wrote:
>
> On Sat, 31 May 2025 at 10:20, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
> > >
> > > Thank you for updating the patch. Here are some review comments:
> > >
> > > ---
> > > + /*
> > > + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> > > + * this must be done unconditionally because the processing might
> > > + * fail before we reach invalidation messages.
> > > + */
> > > + if (rbtxn_inval_all_cache(txn))
> > > + InvalidateSystemCaches();
> > > + else
> > > + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> > > +
> > > txn->distributed_invalidations);
> > > +
> > >
> > > If we don't need to execute the distributed inval message in an error
> > > path other than detecting concurrent abort, we should describe the
> > > reason.
> > >
> >
> > The concurrent abort handling is done for streaming and prepared
> > transactions, where we send the transaction changes to the client
> > before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
> > among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
> > prepared transaction case. For ROLLBACK, we will never perform any
> > changes from the current transaction, so we don't need distributed
> > invalidations to be executed. For COMMIT, if we encounter any errors
> > while processing changes (this is when we reach the ERROR path, which
> > is not a concurrent abort), then we will reprocess all changes and, at
> > the end, execute both the current transaction and distributed
> > invalidations. Now, one possibility is that if, after ERROR, the
> > caller does slot_advance to skip the ERROR, then we will probably miss
> > executing the distributed invalidations, leading to data loss
> > afterwards. If the above theory is correct, then it is better to
> > execute distributed invalidation even in non-concurrent-abort cases in
> > the ERROR path.
This theory seems to be correct to me. If users re-try the logical
decoding with smaller logical_decoding_work_mem the process could end
up streaming the transaction with the wrong relcache before sending
the committed transaction?
> One possible reason this scenario may not occur is that
> pg_logical_slot_get_changes_guts uses a PG_CATCH block to handle
> exceptions, during which it calls InvalidateSystemCaches to clear the
> system cache. Because of this, I believe the scenario might not
> actually happen.
> @Sawada-san / others — Are there any other cases where this could still occur?
I think that we cannot confine use cases to walsender and
pg_logical_slot_get_changes_guts() given that we expose logical
decoding API at C level.
Regards,
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
From | Date | Subject | |
---|---|---|---|
Next Message | Anthonin Bonnefoy | 2025-06-02 17:00:26 | Re: BUG #18944: Assertion Failure in psql with idle_session_timeout Set |
Previous Message | Badia, Antonio | 2025-06-02 15:26:09 | Re: function 'IS JSON ARRAY' not identified |