RE: Bound memory usage during manual slot sync retries

From: "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
To: shveta malik <shveta(dot)malik(at)gmail(dot)com>, Xuneng Zhou <xunengzhou(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Amit Kapila <amitkapila16(at)gmail(dot)com>, "itsajin(at)gmail(dot)com" <itsajin(at)gmail(dot)com>, shveta malik <shvetamalik(at)gmail(dot)com>
Subject: RE: Bound memory usage during manual slot sync retries
Date: 2026-05-15 08:41:09
Message-ID: TY4PR01MB177184079FF3A1EE4DE129A0994042@TY4PR01MB17718.jpnprd01.prod.outlook.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Friday, May 15, 2026 2:36 PM shveta malik <shveta(dot)malik(at)gmail(dot)com> wrote:
> On Fri, May 15, 2026 at 11:03 AM Xuneng Zhou <xunengzhou(at)gmail(dot)com>
> wrote:
> >
> > Hi Hackers,
> >
> > pg_sync_replication_slots() now retries inside a single SQL function
> > call. Unlike the slotsync worker, it does not get a transaction
> > boundary between retry cycles, so allocations made while fetching and
> > synchronizing remote slots can accumulate until the function returns.
> >
> > The existing list_free_deep(remote_slots) is not enough to bound this.
> > It frees the List cells and the RemoteSlot structs stored as list
> > elements, but it does not recursively free allocations owned by those
> > structs, such as the copied slot name, plugin name, and database name.
> > It also does not release unrelated per-cycle allocations made while
> > fetching and processing the remote slots.
> >
> > This is probably modest in normal cases, as the retained memory is
> > roughly proportional to the number of retries times the number of remote
> slots.
> > Still, the function can wait for a long time on a lagging or
> > misconfigured standby, so the growth is unbounded for that call.
> >
> > The attached patch runs each manual retry cycle in a short-lived
> > memory context and resets it before the next attempt. The slot name
> > list needed across retries is copied outside that context.
> >
> > It also adds two local cleanups. Tuple slots created with
> > MakeSingleTupleTableSlot() are explicitly released with
> > ExecDropSingleTupleTableSlot() before clearing the walreceiver result.
> > The memory context would reclaim the storage eventually, but the slot
> > also holds a reference to the result tuple descriptor, so releasing it
> > at the matching ownership boundary seems clearer and follows nearby
> code.
> >
> > drop_local_obsolete_slots() now frees the temporary list container it
> > builds. The retry-cycle context would reclaim this list too, but
> > freeing it locally would make the helper self-contained.
> >
> > Politely CCed the original authors.
> >
> >
>
> I like the core idea of this patch. It makes the flow cleaner and protects the
> flow from memory-leaks when dealing with a high number of slots. I think
> during implementation, we considered having a separate memory context,
> but since we were freeing the remote_slots then, we thought allocating a new
> memory context might not be required. But on re-thinking and reading the
> details above, IMO, it is a good improvement.

+1 to the general idea of avoiding memory accumulation.

>
> A couple of minor comments:
>
> 1. I think we need to delete the new memory context in
> slotsync_failure_callback() as well.

I think we can avoid doing that, because the new memory context should be
destroyed along with its parent context on transaction abort (if any error
occurs).

~~~

Just one comment:

@@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
local_slot->data.database));
}
}
+
+ list_free(local_slots);
}

I think we prefer to avoid freeing memory explicitly, since this is a
static function and we already have a memory context in place to prevent leaks.
(We've seen this discussion about explicit freeing multiple times before, and
the previous conclusion was to rely on the memory context management rather than
adding more free code.)

Best Regards,
Hou zj

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Etsuro Fujita 2026-05-15 08:45:43 Re: postgres_fdw: restore_stats uses current user's mapping instead of table owner's during ANALYZE
Previous Message Chao Li 2026-05-15 08:34:41 Re: Adjust pg_stat_get_lock() prorows to match lock types