Re: Bound memory usage during manual slot sync retries

From: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
To: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shveta(dot)malik(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amitkapila16(at)gmail(dot)com>, "itsajin(at)gmail(dot)com" <itsajin(at)gmail(dot)com>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: Bound memory usage during manual slot sync retries
Date: 2026-05-15 12:53:50
Message-ID: CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Shveta, Zhijie,

On Fri, May 15, 2026 at 4:41 PM Zhijie Hou (Fujitsu)
<houzj(dot)fnst(at)fujitsu(dot)com> wrote:
>
> On Friday, May 15, 2026 2:36 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> > On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> > wrote:
> > >
> > > Hi Hackers,
> > >
> > > pg_sync_replication_slots() now retries inside a single SQL function
> > > call. Unlike the slotsync worker, it does not get a transaction
> > > boundary between retry cycles, so allocations made while fetching and
> > > synchronizing remote slots can accumulate until the function returns.
> > >
> > > The existing list_free_deep(remote_slots) is not enough to bound this.
> > > It frees the List cells and the RemoteSlot structs stored as list
> > > elements, but it does not recursively free allocations owned by those
> > > structs, such as the copied slot name, plugin name, and database name.
> > > It also does not release unrelated per-cycle allocations made while
> > > fetching and processing the remote slots.
> > >
> > > This is probably modest in normal cases, as the retained memory is
> > > roughly proportional to the number of retries times the number of remote
> > slots.
> > > Still, the function can wait for a long time on a lagging or
> > > misconfigured standby, so the growth is unbounded for that call.
> > >
> > > The attached patch runs each manual retry cycle in a short-lived
> > > memory context and resets it before the next attempt. The slot name
> > > list needed across retries is copied outside that context.
> > >
> > > It also adds two local cleanups. Tuple slots created with
> > > MakeSingleTupleTableSlot() are explicitly released with
> > > ExecDropSingleTupleTableSlot() before clearing the walreceiver result.
> > > The memory context would reclaim the storage eventually, but the slot
> > > also holds a reference to the result tuple descriptor, so releasing it
> > > at the matching ownership boundary seems clearer and follows nearby
> > code.
> > >
> > > drop_local_obsolete_slots() now frees the temporary list container it
> > > builds. The retry-cycle context would reclaim this list too, but
> > > freeing it locally would make the helper self-contained.
> > >
> > > Politely CCed the original authors.
> > >
> > >
> >
> > I like the core idea of this patch. It makes the flow cleaner and protects the
> > flow from memory-leaks when dealing with a high number of slots. I think
> > during implementation, we considered having a separate memory context,
> > but since we were freeing the remote_slots then, we thought allocating a new
> > memory context might not be required. But on re-thinking and reading the
> > details above, IMO, it is a good improvement.

Yeah, we need to stop the memory accumulation, either by using a
dedicated memory context or by freeing the memory manually.

> +1 to the general idea of avoiding memory accumulation.
>
> >
> > A couple of minor comments:
> >
> > 1. I think we need to delete the new memory context in
> > slotsync_failure_callback() as well.
>
> I think we can avoid doing that, because the new memory context should be
> destroyed along with its parent context on transaction abort (if any error
> occurs).

Yeah, normal error/transaction memory context cleanup seems enough. It
avoids adding callback state only to delete cycle_ctx.

> Just one comment:
>
> @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
> local_slot->data.database));
> }
> }
> +
> + list_free(local_slots);
> }
>
> I think we prefer to avoid freeing memory explicitly, since this is a
> static function and we already have a memory context in place to prevent leaks.
> (We've seen this discussion about explicit freeing multiple times before, and
> the previous conclusion was to rely on the memory context management rather than
> adding more free code.)

Thanks for pointing out. I wasn't aware of the prior discussion. I
mainly wanted to avoid potential issues in case future callers invoke
it without handling the cleanup properly.

--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.

Attachment Content-Type Size
v2-0001-Bound-memory-usage-during-manual-slot-sync-retrie.patch application/octet-stream 3.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2026-05-15 13:20:01 Re: Bound memory usage during manual slot sync retries
Previous Message Vlad Lesin 2026-05-15 12:29:30 Re: [PATCH] Fix ProcKill lock-group vs procLatch recycle race