| From: | Heikki Linnakangas <hlinnakangas(at)vmware(dot)com> |
|---|---|
| To: | Bruce Momjian <bruce(at)momjian(dot)us>, Andres Freund <andres(at)2ndquadrant(dot)com> |
| Cc: | <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: uninitialized values in revised prepared xact code |
| Date: | 2014-10-24 14:13:49 |
| Message-ID: | 544A5E9D.2090808@vmware.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 10/11/2014 10:20 PM, Bruce Momjian wrote:
>
> Uh, was this fixed. I see a cleanup commit for this C file, but this
> report is from June:
>
> commit 07a4a93a0e35a778c77ffbbbc18de29e859e18f0
> Author: Heikki Linnakangas <heikki(dot)linnakangas(at)iki(dot)fi>
> Date: Fri May 16 09:47:50 2014 +0300
>
> Initialize tsId and dbId fields in WAL record of COMMIT PREPARED.
>
> Commit dd428c79 added dbId and tsId to the xl_xact_commit struct but missed
> that prepared transaction commits reuse that struct. Fix that.
>
> Because those fields were left unitialized, replaying a commit prepared WAL
> record in a hot standby node would fail to remove the relcache init file.
> That can lead to "could not open file" errors on the standby. Relcache init
> file only needs to be removed when a system table/index is rewritten in the
> transaction using two phase commit, so that should be rare in practice. In
> HEAD, the incorrect dbId/tsId values are also used for filtering in logical
> replication code, causing the transaction to always be filtered out.
>
> Analysis and fix by Andres Freund. Backpatch to 9.0 where hot standby was
> introduced.
No, that was a different issue.
(more below)
> On Mon, Jun 30, 2014 at 11:58:59AM +0200, Andres Freund wrote:
>> Hi,
>>
>> I've just rerun valgrind for the first time in a while and saw the
>> following splat. My guess is it exists since bb38fb0d43c, but that's
>> blindly guessing:
>>
>> ==2049== Use of uninitialised value of size 8
>> ==2049== at 0x4FE66D: EndPrepare (twophase.c:1063)
>> ==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049== by 0x679405: main (main.c:219)
>> ==2049== Uninitialised value was created by a stack allocation
>> ==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
>> ==2049==
>> ==2049== Syscall param write(buf) points to uninitialised byte(s)
>> ==2049== at 0x5C69640: __write_nocancel (syscall-template.S:81)
>> ==2049== by 0x4FE6AE: EndPrepare (twophase.c:1064)
>> ==2049== by 0x4F231B: PrepareTransaction (xact.c:2217)
>> ==2049== by 0x4F2A38: CommitTransactionCommand (xact.c:2676)
>> ==2049== by 0x79013E: finish_xact_command (postgres.c:2408)
>> ==2049== by 0x78DE97: exec_simple_query (postgres.c:1062)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== by 0x716804: PostmasterMain (postmaster.c:1219)
>> ==2049== by 0x679405: main (main.c:219)
>> ==2049== Address 0x64694ed is 1,389 bytes inside a block of size 8,192 alloc'd
>> ==2049== at 0x4C27B8F: malloc (vg_replace_malloc.c:298)
>> ==2049== by 0x8E766E: AllocSetAlloc (aset.c:853)
>> ==2049== by 0x8E8E04: MemoryContextAllocZero (mcxt.c:627)
>> ==2049== by 0x8A54D3: AtStart_Inval (inval.c:704)
>> ==2049== by 0x4F1DFC: StartTransaction (xact.c:1841)
>> ==2049== by 0x4F28D1: StartTransactionCommand (xact.c:2529)
>> ==2049== by 0x7900A7: start_xact_command (postgres.c:2383)
>> ==2049== by 0x78DAF4: exec_simple_query (postgres.c:860)
>> ==2049== by 0x791FDD: PostgresMain (postgres.c:4010)
>> ==2049== by 0x71B13B: BackendRun (postmaster.c:4113)
>> ==2049== by 0x71A86D: BackendStartup (postmaster.c:3787)
>> ==2049== by 0x71714C: ServerLoop (postmaster.c:1566)
>> ==2049== Uninitialised value was created by a stack allocation
>> ==2049== at 0x4FE16C: StartPrepare (twophase.c:942)
>>
>> It's probably just padding - twophase.c:1063 is the CRC32 computation of
>> the record data.
Yeah. The padding bytes in TwoPhaseFileHeader were not initialized.
That's simple enough to fix, but when I run valgrind, I get a lot whole
bunch of similar messages. A few are from pgstat: the padding bytes in
the pgstat messages are not initialized. One comes from
write_relcache_init_file(); again I believe it's padding bytes being
uninitialized (in FormData_pg_attribute). And one from the XLogInsert
from heap_insert; there's an uninitialized padding byte in
xl_heap_insert. And so forth.. Is it worthwhile to hunt down all of
these? If there aren't many more than these, it probably is worth it,
but I fear this might be an endless effort. Have we been clean of these
warnings at any point in the past?
- Heikki
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Heikki Linnakangas | 2014-10-24 14:21:27 | Re: pg_receivexlog --status-interval add fsync feedback |
| Previous Message | Andres Freund | 2014-10-24 14:10:55 | Re: Deferring some AtStart* allocations? |