Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com>
Cc: vignesh C <vignesh21(at)gmail(dot)com>, "Hayato Kuroda (Fujitsu)" <kuroda(dot)hayato(at)fujitsu(dot)com>, Duncan Sands <duncan(dot)sands(at)deepbluecap(dot)com>, "pgsql-bugs(at)lists(dot)postgresql(dot)org" <pgsql-bugs(at)lists(dot)postgresql(dot)org>
Subject: Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Date: 2025-05-31 04:50:45
Message-ID: CAA4eK1JQ2UQEbKdWUa0ozh3NdJFMJXfAKRW-vFrC3JqKxmx1FQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Fri, May 30, 2025 at 11:00 PM Masahiko Sawada <sawada(dot)mshk(at)gmail(dot)com> wrote:
>
> Thank you for updating the patch. Here are some review comments:
>
> ---
> + /*
> + * Make sure there's no cache pollution. Unlike the PG_TRY part,
> + * this must be done unconditionally because the processing might
> + * fail before we reach invalidation messages.
> + */
> + if (rbtxn_inval_all_cache(txn))
> + InvalidateSystemCaches();
> + else
> + ReorderBufferExecuteInvalidations(txn->ninvalidations_distr,
> +
> txn->distributed_invalidations);
> +
>
> If we don't need to execute the distributed inval message in an error
> path other than detecting concurrent abort, we should describe the
> reason.
>

The concurrent abort handling is done for streaming and prepared
transactions, where we send the transaction changes to the client
before we read COMMIT/COMMIT PREPARED/ROLLBACK/ROLLBACK PREPARED. Now,
among these COMMIT/ROLLBACK PREPARED cases are handled as part of a
prepared transaction case. For ROLLBACK, we will never perform any
changes from the current transaction, so we don't need distributed
invalidations to be executed. For COMMIT, if we encounter any errors
while processing changes (this is when we reach the ERROR path, which
is not a concurrent abort), then we will reprocess all changes and, at
the end, execute both the current transaction and distributed
invalidations. Now, one possibility is that if, after ERROR, the
caller does slot_advance to skip the ERROR, then we will probably miss
executing the distributed invalidations, leading to data loss
afterwards. If the above theory is correct, then it is better to
execute distributed invalidation even in non-concurrent-abort cases in
the ERROR path.

> ---
> Given that we don't account the memory usage of both
> txn->invalidations and txn->distributed_invalidations, probably we can
> have a lower limit, say 8MB (or lower?), to avoid memory exhaustion.
>

Are you thinking if there are many waslenders in the system, and we
kept the limit higher, say 256 MB, then all will start consuming that
much memory, leading to memory exhaustion? If so, then I agree with
your point to keep this limit as 8MB, we can probably explain the same
in comments as well.

> ---
> + if ((for_inval && !AllocSizeIsValid(req_mem_size)) ||
> + rbtxn_inval_all_cache(txn))
> {
> - txn->ninvalidations = nmsgs;
> - txn->invalidations = (SharedInvalidationMessage *)
> - palloc(sizeof(SharedInvalidationMessage) * nmsgs);
> - memcpy(txn->invalidations, msgs,
> - sizeof(SharedInvalidationMessage) * nmsgs);
> + txn->txn_flags |= RBTXN_INVAL_ALL_CACHE;
> +
> + if (*invalidations)
> + {
> + pfree(*invalidations);
> + *invalidations = NULL;
> + *ninvalidations = 0;
> + }
>
> RBTXN_INVAL_ALL_CACHE seems to have an effect only on the distributed
> inval messages. One question is do we need to care about the overflow
> of txn->invalidations as well?
>

I don't think so. This has been working for a long time, so unless we
see a case where the overflow can happen, it is better not to change
it.

>
If no, does it make sense to have a
> separate function like ReorderBufferAddDistributedInvalidtions()
> instead of having an extended version of
> ReorderBufferAddInvalidations()? Some common routines can also be
> declared as a static function if needed.
>

+1.

--
With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message vignesh C 2025-05-31 07:57:52 Re: Logical replication 'invalid memory alloc request size 1585837200' after upgrading to 17.5
Previous Message Pepe Fagoaga 2025-05-30 21:14:26 Re: BUG #18941: PostgreSQL planner does not select a multicolumn btree_gin index under RLS