Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Noah Misch <noah(at)leadboat(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-05 22:55:13
Message-ID: 20170805225510.GA3178002@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Aug 03, 2017 at 11:42:25AM -0700, Peter Geoghegan wrote:
> On Thu, Aug 3, 2017 at 8:49 AM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> > With query #2 it ends up crashing after ~5hours and produces
> > the log in log-valgrind-2.txt.gz with some other entries than
> > case #1, but AFAICS still all about reading uninitialised values
> > in space allocated by datumCopy().
>
> Right. This part is really interesting to me:
>
> ==48827== Uninitialised value was created by a heap allocation
> ==48827== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
> ==48827== by 0x80B597: AllocSetAlloc (aset.c:771)
> ==48827== by 0x810ADC: palloc (mcxt.c:862)
> ==48827== by 0x72BFEF: datumCopy (datum.c:171)
> ==48827== by 0x81A74C: tuplesort_putdatum (tuplesort.c:1515)
> ==48827== by 0x5E91EB: advance_aggregates (nodeAgg.c:1023)
>
> If you actually go to datum.c:171, you see that that's a codepath for
> pass-by-reference datatypes that lack a varlena header. Text is a
> datatype that has a varlena header, though, so that's clearly wrong. I
> don't know how this actually happened, but working back through the
> relevant tuplesort_begin_datum() caller, initialize_aggregate(), does
> suggest some things. (tuplesort_begin_datum() is where datumTypeLen is
> determined for the entire datum tuplesort.)
>
> I am once again only guessing, but I have to wonder if this is a
> problem in commit b8d7f053. It seems likely that the problem begins
> before tuplesort_begin_datum() is even called, which is the basis of
> this suspicion. If the problem is within tuplesort, then that could
> only be because get_typlenbyval() gives wrong answers, which seems
> very unlikely.

[Action required within three days. This is a generic notification.]

The above-described topic is currently a PostgreSQL 10 open item. Peter
(Eisentraut), since you committed the patch believed to have created it, you
own this open item. If some other commit is more relevant or if this does not
belong as a v10 open item, please let us know. Otherwise, please observe the
policy on open item ownership[1] and send a status update within three
calendar days of this message. Include a date for your subsequent status
update. Testers may discover new open items at any time, and I want to plan
to get them all fixed well in advance of shipping v10. Consequently, I will
appreciate your efforts toward speedy resolution. Thanks.

[1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Noah Misch 2017-08-05 22:56:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Andres Freund 2017-08-05 22:06:06 Re: [BUG] pg9.4.10 Logical decoding did not get the correct oldtuplelen

Browse pgsql-hackers by date

  From Date Subject
Next Message Noah Misch 2017-08-05 22:56:59 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Noah Misch 2017-08-05 22:44:31 Re: Subscription code improvements