Re: Valgrind-detected bug in partitioning code

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Valgrind-detected bug in partitioning code
Date: 2017-01-20 20:47:20
Message-ID: CA+Tgmobs5b_E8d4oAUOuTwf=K13Mf+Jn=zNg08Hsi-Np6vwhOQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 20, 2017 at 8:09 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> skink has been unhappy since commit d26fa4f went in, but I think
> that just exposed a pre-existing bug. Running valgrind here
> duplicates the failure:
>
> ==00:00:02:01.653 16626== Conditional jump or move depends on uninitialised value(s)
> ==00:00:02:01.653 16626== at 0x4BDF6B: btint4cmp (nbtcompare.c:97)
> ==00:00:02:01.653 16626== by 0x81D6BE: FunctionCall2Coll (fmgr.c:1318)
> ==00:00:02:01.653 16626== by 0x52D584: partition_bounds_equal (partition.c:627)
> ==00:00:02:01.653 16626== by 0x80CF8E: RelationClearRelation (relcache.c:1203)
> ==00:00:02:01.653 16626== by 0x80E601: RelationCacheInvalidateEntry (relcache.c:2662)
> ==00:00:02:01.653 16626== by 0x803DD6: LocalExecuteInvalidationMessage (inval.c:568)
> ==00:00:02:01.653 16626== by 0x803F53: ProcessInvalidationMessages.clone.0 (inval.c:444)
> ==00:00:02:01.653 16626== by 0x8040C8: CommandEndInvalidationMessages (inval.c:1056)
> ==00:00:02:01.653 16626== by 0x80C719: RelationSetNewRelfilenode (relcache.c:3490)
> ==00:00:02:01.653 16626== by 0x5CD50A: ExecuteTruncate (tablecmds.c:1393)
> ==00:00:02:01.653 16626== by 0x721AC7: standard_ProcessUtility (utility.c:532)
> ==00:00:02:01.653 16626== by 0x71D943: PortalRunUtility (pquery.c:1163)
>
> IOW, partition_bounds_equal() is testing uninitialized memory during
> a TRUNCATE on a partitioned table.

Hmm. That's bad. I kind of wonder how sane it is to think that we
can invoke SQL-callable functions during a relcache reload, because
couldn't we be processing an invalidation in the context of an aborted
transaction? And I wonder why we really need or want to do that
anyway. For purposes of equalPartitionDescs(), it seems like the
relevant test is datumIsEqual(), not the equality operator derived
from the partition opclass.

But I think the immediate problem here is some fuzzy thinking about
the handling of the values taken from b1->content and b2->content.
Those have to be checked before examining values from b1->datums
and/or b2->datums, and the latter should be inspected only if the
former are both identical and both RANGE_DATUM_FINITE. I'll push a
fix for that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-01-20 21:01:37 Re: pg_hba_file_settings view patch
Previous Message Jesper Pedersen 2017-01-20 20:24:34 Re: Microvacuum support for Hash Index