Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Noah Misch <noah(at)leadboat(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Peter Geoghegan <pg(at)bowt(dot)ie>, Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-05 22:56:59
Message-ID: 20170805225659.GA3178094@rfd.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Adding -hackers.

On Sat, Aug 05, 2017 at 03:55:13PM -0700, Noah Misch wrote:
> On Thu, Aug 03, 2017 at 11:42:25AM -0700, Peter Geoghegan wrote:
> > On Thu, Aug 3, 2017 at 8:49 AM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> > > With query #2 it ends up crashing after ~5hours and produces
> > > the log in log-valgrind-2.txt.gz with some other entries than
> > > case #1, but AFAICS still all about reading uninitialised values
> > > in space allocated by datumCopy().
> >
> > Right. This part is really interesting to me:
> >
> > ==48827== Uninitialised value was created by a heap allocation
> > ==48827== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
> > ==48827== by 0x80B597: AllocSetAlloc (aset.c:771)
> > ==48827== by 0x810ADC: palloc (mcxt.c:862)
> > ==48827== by 0x72BFEF: datumCopy (datum.c:171)
> > ==48827== by 0x81A74C: tuplesort_putdatum (tuplesort.c:1515)
> > ==48827== by 0x5E91EB: advance_aggregates (nodeAgg.c:1023)
> >
> > If you actually go to datum.c:171, you see that that's a codepath for
> > pass-by-reference datatypes that lack a varlena header. Text is a
> > datatype that has a varlena header, though, so that's clearly wrong. I
> > don't know how this actually happened, but working back through the
> > relevant tuplesort_begin_datum() caller, initialize_aggregate(), does
> > suggest some things. (tuplesort_begin_datum() is where datumTypeLen is
> > determined for the entire datum tuplesort.)
> >
> > I am once again only guessing, but I have to wonder if this is a
> > problem in commit b8d7f053. It seems likely that the problem begins
> > before tuplesort_begin_datum() is even called, which is the basis of
> > this suspicion. If the problem is within tuplesort, then that could
> > only be because get_typlenbyval() gives wrong answers, which seems
> > very unlikely.
>
> [Action required within three days. This is a generic notification.]
>
> The above-described topic is currently a PostgreSQL 10 open item. Peter
> (Eisentraut), since you committed the patch believed to have created it, you
> own this open item. If some other commit is more relevant or if this does not
> belong as a v10 open item, please let us know. Otherwise, please observe the
> policy on open item ownership[1] and send a status update within three
> calendar days of this message. Include a date for your subsequent status
> update. Testers may discover new open items at any time, and I want to plan
> to get them all fixed well in advance of shipping v10. Consequently, I will
> appreciate your efforts toward speedy resolution. Thanks.
>
> [1] https://www.postgresql.org/message-id/20170404140717.GA2675809%40tornado.leadboat.com

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-08-05 23:03:02 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Noah Misch 2017-08-05 22:55:13 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-08-05 23:03:02 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Noah Misch 2017-08-05 22:55:13 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values