| From: | Khoa Nguyen <kdnguyen9(dot)oss(at)gmail(dot)com> |
|---|---|
| To: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> |
| Cc: | Xuneng Zhou <xunengzhou(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "itsajin(at)gmail(dot)com" <itsajin(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Subject: | Re: Bound memory usage during manual slot sync retries |
| Date: | 2026-05-28 16:42:54 |
| Message-ID: | CAONt3B0UAQtmNFLyMWTrfbP1xUFtJsyCBn-7tNKAobSy4rDkYQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, May 26, 2026 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>
> On Tue, May 26, 2026 at 1:01 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> >
> >
> >> On Mon, May 25, 2026 at 7:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >> >
> >> > Okay, then let's go with a per-retry memory context approach.
> >> >
> >> > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
> >> > local_slot->data.database));
> >> > }
> >> > }
> >> > +
> >> > + list_free(local_slots);
> >> > }
> >> >
> >> > Why do we need this retail pfree if the caller is using memory context?
> >> >
> >>
> >> I see that the latest patch in email [1] has already addressed this
> >> point. So, I'll push the v2 version.
> >>
> >> [1] - https://www.postgresql.org/message-id/CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ%40mail.gmail.com
> >
> >
> > Thanks. My original reasoning for adding the pfree here was to act as a safety guard in case other future callers might not manage the memory properly. But Zhijie pointed out that this double-free pattern is not favored in previous community discussions. I'll post the worst-case test and its results later.
> >
>
> Pushed.
This patch is already pushed by Amit but I wanted to add my review
from a validation standpoint. I was able to use the
measure_slotsync_memory.sh to verify the leak that Xuneng reported.
pre-patch
timestamp,backend_pid,total_bytes,delta_bytes
2026-05-27T16:52:20Z,78800,1339920,
2026-05-27T16:52:22Z,78800,1405456,65536
2026-05-27T16:52:24Z,78800,1405456,0
2026-05-27T16:52:26Z,78800,1405456,0
2026-05-27T16:52:28Z,78800,1536528,131072
2026-05-27T16:52:30Z,78800,1536528,0
2026-05-27T16:52:32Z,78800,1536528,0
patched
timestamp,backend_pid,total_bytes,delta_bytes
2026-05-27T01:17:50Z,66917,1315344,
2026-05-27T01:17:52Z,66917,1315344,0
2026-05-27T01:17:54Z,66917,1315344,0
2026-05-27T01:17:56Z,66917,1315344,0
2026-05-27T01:17:58Z,66917,1315344,0
2026-05-27T01:18:00Z,66917,1315344,0
2026-05-27T01:18:02Z,66917,1315344,0
I also reviewed the script and it is well done. The script setup and
capture data points showing memory leak within the session over time.
The patch looks fine though I think that oldctx should be captured
once outside the retry loop since the current logic is more about
parent and child memory context rather than previous and current
context. Nevertheless, the current code works as well.
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Daniel Gustafsson | 2026-05-28 16:45:56 | Re: new connection establishment (pgbench --connect) slow with pgbouncer due to libpq/OpenSSL global thread contention |
| Previous Message | Andres Freund | 2026-05-28 16:22:35 | Re: Make stack depth check work with asan's use-after-return |