Re: Bound memory usage during manual slot sync retries

From: shveta malik <shveta(dot)malik(at)gmail(dot)com>
To: Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amitkapila16(at)gmail(dot)com>, itsajin(at)gmail(dot)com, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: Re: Bound memory usage during manual slot sync retries
Date: 2026-05-15 06:36:19
Message-ID: CAJpy0uCVo09aK-sVxeU8Kmb+ESdnHODH0hjdqbXmh4mHxivz2g@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
>
> Hi Hackers,
>
> pg_sync_replication_slots() now retries inside a single SQL function
> call. Unlike the slotsync worker, it does not get a transaction boundary
> between retry cycles, so allocations made while fetching and synchronizing
> remote slots can accumulate until the function returns.
>
> The existing list_free_deep(remote_slots) is not enough to bound this.
> It frees the List cells and the RemoteSlot structs stored as list
> elements, but it does not recursively free allocations owned by those
> structs, such as the copied slot name, plugin name, and database name.
> It also does not release unrelated per-cycle allocations made while
> fetching and processing the remote slots.
>
> This is probably modest in normal cases, as the retained memory is roughly
> proportional to the number of retries times the number of remote slots.
> Still, the function can wait for a long time on a lagging or
> misconfigured standby, so the growth is unbounded for that call.
>
> The attached patch runs each manual retry cycle in a short-lived memory
> context and resets it before the next attempt. The slot name list needed
> across retries is copied outside that context.
>
> It also adds two local cleanups. Tuple slots created with
> MakeSingleTupleTableSlot() are explicitly released with
> ExecDropSingleTupleTableSlot() before clearing the walreceiver result.
> The memory context would reclaim the storage eventually, but the slot also
> holds a reference to the result tuple descriptor, so releasing it at the
> matching ownership boundary seems clearer and follows nearby code.
>
> drop_local_obsolete_slots() now frees the temporary list container it
> builds. The retry-cycle context would reclaim this list too, but freeing
> it locally would make the helper self-contained.
>
> Politely CCed the original authors.
>
>

I like the core idea of this patch. It makes the flow cleaner and
protects the flow from memory-leaks when dealing with a high number of
slots. I think during implementation, we considered having a separate
memory context, but since we were freeing the remote_slots then, we
thought allocating a new memory context might not be required. But on
re-thinking and reading the details above, IMO, it is a good
improvement. But let's see what others have to say.

A couple of minor comments:

1. I think we need to delete the new memory context in
slotsync_failure_callback() as well.

2. Also, it will be good to add a comment before switiching to
old-context before extract_slot_names(). Or better we can move the
swicthing inside extract_slot_names() with a comment.

thanks
Shveta

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ayush Tiwari 2026-05-15 06:43:21 Re: [PATCH] refint: Avoid reusing cascade UPDATE plans.
Previous Message John Naylor 2026-05-15 06:33:53 Re: Re-add recently-removed tests for ltree and intarray