| From: | Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> |
|---|---|
| To: | Antonin Houska <ah(at)cybertec(dot)at> |
| Cc: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, vignesh C <vignesh21(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Mihail Nikalayeu <mihailnikalayeu(at)gmail(dot)com>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Treat <rob(at)xzilla(dot)net> |
| Subject: | Re: Adding REPACK [concurrently] |
| Date: | 2026-04-07 09:41:17 |
| Message-ID: | CAFC+b6q8C+LNcxjjPa1Z-pPhQQxF8ohXmR67yKyd4uRmhXERiQ@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Tue, Apr 7, 2026 at 2:59 PM Antonin Houska <ah(at)cybertec(dot)at> wrote:
> Antonin Houska <ah(at)cybertec(dot)at> wrote:
>
> > Srinath Reddy Sadipiralla <srinath2133(at)gmail(dot)com> wrote:
> >
> > > i looked into this , it seems like valgrind catches the uninitialised
> padding bytes, which
> > > repack worker is writing using BufFileWrite, it seems this fix solved
> the problem.
> > >
> > > diff --git a/src/backend/utils/time/snapmgr.c
> b/src/backend/utils/time/snapmgr.c
> > > index 2e6197f5f35..f5682b87626 100644
> > > --- a/src/backend/utils/time/snapmgr.c
> > > +++ b/src/backend/utils/time/snapmgr.c
> > > @@ -1739,6 +1739,8 @@ SerializeSnapshot(Snapshot snapshot, char
> *start_address)
> > >
> > > Assert(snapshot->subxcnt >= 0);
> > >
> > > + MemSet(&serialized_snapshot, 0, sizeof(SerializedSnapshotData));
> > > +
> > > /* Copy all required fields */
> > > serialized_snapshot.xmin = snapshot->xmin;
> > > serialized_snapshot.xmax = snapshot->xmax;
> > >
> > > thoughts?
> >
> > Could you reproduce the failure in your environment?
> >
> > I haven't thought of this explanation because BufFileWrite() only copies
> the
> > data to a buffer in the BufFile structure and BufFileDumpBuffer() writes
> the
> > buffer. Maybe valgrind is able to track the copying?
>
> Given this message, you may be right:
>
> ==1617044== Address 0x12d745e2 is 106 bytes inside a block of size 8,272
> client-defined
>
> In my environment, the 'buffer' field starts at offset 80 into the BufFile
> structure. We first write 8 bytes into it
>
> BufFileWrite(file, &snap_size, sizeof(snap_size));
>
> followed by the snapshot. Since sizeof(SerializedSnapshotData) is 24, the
> offset 106 should be the padding following the 'takenDuringRecovery' field.
>
yeah , same in my environment aslo
==00:00:00:06.569 3479320== Syscall param pwrite64(buf) points to
uninitialised byte(s)
==00:00:00:06.569 3479320== at 0x55D6D38: pwrite (pwrite64.c:25)
==00:00:00:06.569 3479320== by 0x842EE7: pg_pwritev (pg_iovec.h:101)
==00:00:00:06.569 3479320== by 0x845F67: FileWriteV (fd.c:2280)
==00:00:00:06.569 3479320== by 0x840877: FileWrite (fd.h:245)
==00:00:00:06.569 3479320== by 0x841407: BufFileDumpBuffer
(buffile.c:538)
==00:00:00:06.569 3479320== by 0x841A67: BufFileFlush (buffile.c:724)
==00:00:00:06.569 3479320== by 0x8410B7: BufFileClose (buffile.c:418)
==00:00:00:06.569 3479320== by 0x4BFBBB: export_initial_snapshot
(repack_worker.c:346)
==00:00:00:06.569 3479320== by 0x4BF703: RepackWorkerMain
(repack_worker.c:145)
==00:00:00:06.569 3479320== by 0x765D3F: BackgroundWorkerMain
(bgworker.c:865)
==00:00:00:06.569 3479320== by 0x76C703: postmaster_child_launch
(launch_backend.c:265)
==00:00:00:06.569 3479320== by 0x774F87: StartBackgroundWorker
(postmaster.c:4197)
==00:00:00:06.569 3479320== Address 0x7b055f2 is 106 bytes inside a block
of size 8,272 client-defined
==00:00:00:06.569 3479320== at 0xB234F4: palloc (mcxt.c:1411)
==00:00:00:06.569 3479320== by 0x8408BF: makeBufFileCommon
(buffile.c:121)
==00:00:00:06.569 3479320== by 0x840C2F: BufFileCreateFileSet
(buffile.c:272)
==00:00:00:06.569 3479320== by 0x4BFB7F: export_initial_snapshot
(repack_worker.c:341)
==00:00:00:06.569 3479320== by 0x4BF703: RepackWorkerMain
(repack_worker.c:145)
==00:00:00:06.569 3479320== by 0x765D3F: BackgroundWorkerMain
(bgworker.c:865)
==00:00:00:06.569 3479320== by 0x76C703: postmaster_child_launch
(launch_backend.c:265)
==00:00:00:06.569 3479320== by 0x774F87: StartBackgroundWorker
(postmaster.c:4197)
==00:00:00:06.569 3479320== by 0x775277: maybe_start_bgworkers
(postmaster.c:4362)
==00:00:00:06.569 3479320== by 0x7739F7:
LaunchMissingBackgroundProcesses (postmaster.c:3437)
==00:00:00:06.569 3479320== by 0x770C2B: ServerLoop (postmaster.c:1737)
==00:00:00:06.569 3479320== by 0x77037B: PostmasterMain
(postmaster.c:1412)
==00:00:00:06.569 3479320== Uninitialised value was created by a stack
allocation
==00:00:00:06.569 3479320== at 0xB43008: SerializeSnapshot
(snapmgr.c:1737)
==00:00:00:06.569 3479320==
and you are spot on here with the explanation with offsets
Size snap_size; 8 bytes;
TransactionId xmin; 4 bytes
TransactionId xmax; 4 bytes
uint32 xcnt; 4 bytes
int32 subxcnt; 4 bytes
bool suboverflowed; 1 byte
bool takenDuringRecovery; 1 byte
/* 2 bytes of padding */
CommandId curcid; 4 bytes
--
Thanks,
Srinath Reddy Sadipiralla
EDB: https://www.enterprisedb.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | wenhui qiu | 2026-04-07 09:57:03 | Re: Clean up remove_rel_from_query() after self-join elimination commit |
| Previous Message | Antonin Houska | 2026-04-07 09:39:12 | Re: Adding REPACK [concurrently] |