| From: | Xuneng Zhou <xunengzhou(at)gmail(dot)com> |
|---|---|
| To: | Khoa Nguyen <kdnguyen9(dot)oss(at)gmail(dot)com> |
| Cc: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "itsajin(at)gmail(dot)com" <itsajin(at)gmail(dot)com>, "Zhijie Hou (Fujitsu)" <houzj(dot)fnst(at)fujitsu(dot)com> |
| Subject: | Re: Bound memory usage during manual slot sync retries |
| Date: | 2026-05-29 02:30:57 |
| Message-ID: | CABPTF7WpEb8hBYyLU7wKjmxuMrtjgyKN-AMLsyTb+RhS9pFatw@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
Hi Khoa,
On Fri, May 29, 2026 at 12:43 AM Khoa Nguyen <kdnguyen9(dot)oss(at)gmail(dot)com> wrote:
>
> On Tue, May 26, 2026 at 4:39 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> >
> > On Tue, May 26, 2026 at 1:01 PM Xuneng Zhou <xunengzhou(at)gmail(dot)com> wrote:
> > >
> > >
> > >> On Mon, May 25, 2026 at 7:03 PM Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> > >> >
> > >> > Okay, then let's go with a per-retry memory context approach.
> > >> >
> > >> > @@ -579,6 +579,8 @@ drop_local_obsolete_slots(List *remote_slot_list)
> > >> > local_slot->data.database));
> > >> > }
> > >> > }
> > >> > +
> > >> > + list_free(local_slots);
> > >> > }
> > >> >
> > >> > Why do we need this retail pfree if the caller is using memory context?
> > >> >
> > >>
> > >> I see that the latest patch in email [1] has already addressed this
> > >> point. So, I'll push the v2 version.
> > >>
> > >> [1] - https://www.postgresql.org/message-id/CABPTF7WB4Z62sPoZkhSygOCAo3OiTDLpMELxZDuwCb3HYgM_pQ%40mail.gmail.com
> > >
> > >
> > > Thanks. My original reasoning for adding the pfree here was to act as a safety guard in case other future callers might not manage the memory properly. But Zhijie pointed out that this double-free pattern is not favored in previous community discussions. I'll post the worst-case test and its results later.
> > >
> >
> > Pushed.
>
>
> This patch is already pushed by Amit but I wanted to add my review
> from a validation standpoint. I was able to use the
> measure_slotsync_memory.sh to verify the leak that Xuneng reported.
> pre-patch
> timestamp,backend_pid,total_bytes,delta_bytes
> 2026-05-27T16:52:20Z,78800,1339920,
> 2026-05-27T16:52:22Z,78800,1405456,65536
> 2026-05-27T16:52:24Z,78800,1405456,0
> 2026-05-27T16:52:26Z,78800,1405456,0
> 2026-05-27T16:52:28Z,78800,1536528,131072
> 2026-05-27T16:52:30Z,78800,1536528,0
> 2026-05-27T16:52:32Z,78800,1536528,0
>
> patched
> timestamp,backend_pid,total_bytes,delta_bytes
> 2026-05-27T01:17:50Z,66917,1315344,
> 2026-05-27T01:17:52Z,66917,1315344,0
> 2026-05-27T01:17:54Z,66917,1315344,0
> 2026-05-27T01:17:56Z,66917,1315344,0
> 2026-05-27T01:17:58Z,66917,1315344,0
> 2026-05-27T01:18:00Z,66917,1315344,0
> 2026-05-27T01:18:02Z,66917,1315344,0
>
> I also reviewed the script and it is well done. The script setup and
> capture data points showing memory leak within the session over time.
> The patch looks fine though I think that oldctx should be captured
> once outside the retry loop since the current logic is more about
> parent and child memory context rather than previous and current
> context. Nevertheless, the current code works as well.
Thanks for your review and verification. I agree with your suggestion
that capturing the outer memory context once outside the retry loop
looks cleaner and reflect the lifetime model better. I am also
wondering whether renaming the oldctx to outerctx could express the
parent/child relationship more clearly. That said, I am ok with the
status quo, if Amit thinks no further modification is needed.
--
Regards,
Xuneng Zhou
HighGo Software Co., Ltd.
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-Use-outer-memory-context-across-slot-sync-retries.patch | application/octet-stream | 2.5 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Adam Brusselback | 2026-05-29 02:53:15 | Re: [Patch] Add WHERE clause support to REFRESH MATERIALIZED VIEW |
| Previous Message | Japin Li | 2026-05-29 02:01:41 | Re: pg_rewind does not rewind diverging timelines |