Re: Bound memory usage during manual slot sync retries

From: Khoa Nguyen <kdnguyen9(dot)oss(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Xuneng Zhou <xunengzhou(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "itsajin(at)gmail(dot)com" <itsajin(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com>
Subject: Re: Bound memory usage during manual slot sync retries
Date: 2026-05-28 16:42:54
Message-ID: CAONt3B0UAQtmNFLyMWTrfbP1xUFtJsyCBn-7tNKAobSy4rDkYQ@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, May 26, 2026 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, May 26, 2026 at 1:01 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> >
> >> On Mon, May 25, 2026 at 7:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >> >
> >> > Okay, then let's go with a per-retry memory context approach.
> >> >
> >> > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
> >> > local_slot->data.database));
> >> > }
> >> > }
> >> > +
> >> > + list_free(local_slots);
> >> > }
> >> >
> >> > Why do we need this retail pfree if the caller is using memory context?
> >> >
> >>
> >> I see that the latest patch in email [1] has already addressed this
> >> point. So, I'll push the v2 version.
> >>
> >> [1] - https://www.postgresql.org/message-id/CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ%40mail.gmail.com
> >
> >
> > Thanks. My original reasoning for adding the pfree here was to act as a safety guard in case other future callers might not manage the memory properly. But Zhijie pointed out that this double-free pattern is not favored in previous community discussions. I'll post the worst-case test and its results later.
> >
>
> Pushed.

This patch is already pushed by Amit but I wanted to add my review
from a validation standpoint. I was able to use the
measure_slotsync_memory.sh to verify the leak that Xuneng reported.
pre-patch
timestamp,backend_pid,total_bytes,delta_bytes
2026-05-27T16:52:20Z,78800,1339920,
2026-05-27T16:52:22Z,78800,1405456,65536
2026-05-27T16:52:24Z,78800,1405456,0
2026-05-27T16:52:26Z,78800,1405456,0
2026-05-27T16:52:28Z,78800,1536528,131072
2026-05-27T16:52:30Z,78800,1536528,0
2026-05-27T16:52:32Z,78800,1536528,0

patched
timestamp,backend_pid,total_bytes,delta_bytes
2026-05-27T01:17:50Z,66917,1315344,
2026-05-27T01:17:52Z,66917,1315344,0
2026-05-27T01:17:54Z,66917,1315344,0
2026-05-27T01:17:56Z,66917,1315344,0
2026-05-27T01:17:58Z,66917,1315344,0
2026-05-27T01:18:00Z,66917,1315344,0
2026-05-27T01:18:02Z,66917,1315344,0

I also reviewed the script and it is well done. The script setup and
capture data points showing memory leak within the session over time.
The patch looks fine though I think that oldctx should be captured
once outside the retry loop since the current logic is more about
parent and child memory context rather than previous and current
context. Nevertheless, the current code works as well.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2026-05-28 16:45:56 Re: new connection establishment (pgbench --connect) slow with pgbouncer due to libpq/OpenSSL global thread contention
Previous Message Andres Freund 2026-05-28 16:22:35 Re: Make stack depth check work with asan's use-after-return