Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-03 18:42:25
Message-ID: CAH2-Wzn1OCy9Wu594A2hrJvCOtGsp-xS6e0hiVsNNYBntWr=rQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Thu, Aug 3, 2017 at 8:49 AM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> With query #2 it ends up crashing after ~5hours and produces
> the log in log-valgrind-2.txt.gz with some other entries than
> case #1, but AFAICS still all about reading uninitialised values
> in space allocated by datumCopy().

Right. This part is really interesting to me:

==48827== Uninitialised value was created by a heap allocation
==48827== at 0x4C28C20: malloc (vg_replace_malloc.c:296)
==48827== by 0x80B597: AllocSetAlloc (aset.c:771)
==48827== by 0x810ADC: palloc (mcxt.c:862)
==48827== by 0x72BFEF: datumCopy (datum.c:171)
==48827== by 0x81A74C: tuplesort_putdatum (tuplesort.c:1515)
==48827== by 0x5E91EB: advance_aggregates (nodeAgg.c:1023)

If you actually go to datum.c:171, you see that that's a codepath for
pass-by-reference datatypes that lack a varlena header. Text is a
datatype that has a varlena header, though, so that's clearly wrong. I
don't know how this actually happened, but working back through the
relevant tuplesort_begin_datum() caller, initialize_aggregate(), does
suggest some things. (tuplesort_begin_datum() is where datumTypeLen is
determined for the entire datum tuplesort.)

I am once again only guessing, but I have to wonder if this is a
problem in commit b8d7f053. It seems likely that the problem begins
before tuplesort_begin_datum() is even called, which is the basis of
this suspicion. If the problem is within tuplesort, then that could
only be because get_typlenbyval() gives wrong answers, which seems
very unlikely.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2017-08-03 19:19:24 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Ross Biro 2017-08-03 16:07:59 Re: BUG #14770: postgres_fdw assumes foreign table is in postgres

Browse pgsql-hackers by date

  From Date Subject
Next Message Oliver Ford 2017-08-03 18:54:48 Re: Add Roman numeral conversion to to_number
Previous Message David Fetter 2017-08-03 18:38:18 Re: Add Roman numeral conversion to to_number