Re: valgrind errors around dsa.c

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: valgrind errors around dsa.c
Date: 2017-04-08 02:46:04
Message-ID: CAEepm=0W8u+t52zgQkXvN-1yuCauZCbZmHy7F2ZmxYtj5zEN=A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 8, 2017 at 8:57 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sat, Apr 8, 2017 at 4:49 AM, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> newly added tests exercise parallel bitmap scans. And they trigger
>> valgrind errors:
>> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2017-04-07%2007%3A10%3A01
>>
>>
>> ==4567== VALGRINDERROR-BEGIN
>> ==4567== Conditional jump or move depends on uninitialised value(s)
>> ==4567== at 0x5FD62A: check_for_freed_segments (dsa.c:2219)
>> ==4567== by 0x5FD97E: dsa_get_address (dsa.c:934)
>> ==4567== by 0x5FDA2A: init_span (dsa.c:1339)
>> ==4567== by 0x5FE6D1: ensure_active_superblock (dsa.c:1696)
>> ==4567== by 0x5FEBBD: alloc_object (dsa.c:1452)
>> ==4567== by 0x5FEBBD: dsa_allocate_extended (dsa.c:693)
>> ==4567== by 0x3C7A83: pagetable_allocate (tidbitmap.c:1536)
>> ==4567== by 0x3C7A83: pagetable_create (simplehash.h:342)
>> ==4567== by 0x3C7A83: tbm_create_pagetable (tidbitmap.c:323)
>> ==4567== by 0x3C8DAD: tbm_get_pageentry (tidbitmap.c:1246)
>> ==4567== by 0x3C98A1: tbm_add_tuples (tidbitmap.c:432)
>> ==4567== by 0x22510C: btgetbitmap (nbtree.c:460)
>> ==4567== by 0x21A8D1: index_getbitmap (indexam.c:726)
>> ==4567== by 0x38AD48: MultiExecBitmapIndexScan (nodeBitmapIndexscan.c:91)
>> ==4567== by 0x37D353: MultiExecProcNode (execProcnode.c:621)
>> ==4567== Uninitialised value was created by a heap allocation
>> ==4567== at 0x602FD5: palloc (mcxt.c:872)
>> ==4567== by 0x5FF73B: create_internal (dsa.c:1242)
>> ==4567== by 0x5FF8F5: dsa_create_in_place (dsa.c:473)
>> ==4567== by 0x37CA32: ExecInitParallelPlan (execParallel.c:532)
>> ==4567== by 0x38C324: ExecGather (nodeGather.c:152)
>> ==4567== by 0x37D247: ExecProcNode (execProcnode.c:551)
>> ==4567== by 0x39870F: ExecNestLoop (nodeNestloop.c:156)
>> ==4567== by 0x37D1B7: ExecProcNode (execProcnode.c:512)
>> ==4567== by 0x3849D4: fetch_input_tuple (nodeAgg.c:686)
>> ==4567== by 0x387764: agg_retrieve_direct (nodeAgg.c:2306)
>> ==4567== by 0x387A11: ExecAgg (nodeAgg.c:2117)
>> ==4567== by 0x37D217: ExecProcNode (execProcnode.c:539)
>> ==4567==
>>
>> It could be that these are spurious due to shared memory - valgrind
>> doesn't track definedness across processes - but the fact that memory
>> allocated by palloc is the source of the undefined memory makes me doubt
>> that.
>
> Thanks. Will post a fix for this later today.

Fix attached.

Explanation: Whenever segments are destroyed because they no longer
contain any live blocks, the shared variable
control->freed_segment_counter advances. Each attached backend has
its own local variable area->freed_segment_counter, and if it sees
that the former differs from the latter it checks all attached
segments to see if any need to be detached. I failed to initialise
the backend-local version, with the consequence that if you were very
unlucky your backend could fail to detach from a no-longer needed
segment until a another segment was eventually freed causing the
shared counter to move again. More likely, it would notice that they
are different because one holds uninitialised junk, perform a spurious
scan for dead segments, and then get them in sync.

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
initialise-freed-segment-counter.patch application/octet-stream 875 bytes

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kevin Grittner 2017-04-08 02:52:49 Re: [PATCH] Add GUCs for predicate lock promotion thresholds
Previous Message Joe Conway 2017-04-08 02:36:49 Re: partitioned tables and contrib/sepgsql