Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: "Peter Geoghegan" <pg(at)bowt(dot)ie>
Cc: "PostgreSQL mailing lists" <pgsql-bugs(at)postgresql(dot)org>,"Peter Eisentraut" <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-03 15:49:50
Message-ID: 71cd77ab-a421-443f-8536-821be9bd15ed@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Peter Geoghegan wrote:

> it's not that hard to imagine that new bugs were introduced in the process
> of backpatching upstream fixes from later major ICU releases

I've now tried with ICU-52.1 from upstream, built with
gcc -O0 -g to get cleaner stack traces.
The behavior is similar to debian's precompiled lib in how and where
it crashes.

work_mem is at 128MB.

query #1:
select count(distinct wordtext collate "az-x-icu") from words_test;

The backend terminates with SIGSEGV and a 289MB core file :
at ucol.cpp:8033
8033 while(schar > (tchar = *(UCharOffset+offset))) { /* since the
contraction codepoints should be ordered, we skip all that are smaller */

#0 0x00007f7926e12341 in ucol_getLatinOneContractionUTF8 (coll=0x1b422c0,
strength=0, CE=4061217513,
s=0x2b56b5c
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\220",
index=0x7ffde80dc978, len=7) at ucol.cpp:8033
#1 0x00007f7926e12845 in ucol_strcollUseLatin1UTF8 (coll=0x1b422c0,
source=0x2b56b5c
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\220",
sLen=7,
target=0x2b5735c
"wuiredntnsookiemmand-tcl-testsuit-tp65374p65375\204\346\226\231\344\271\237\345\276\210\345\244\232\343\200\202\257debian\345\222\214redhat\344\272\206\357\274\214\344\270\244\344\270\252\215\263\345\264\251\346\272\203\343\200\202\273\345\205\263\351\227\255\347\232\204\351\202\243\344\270\252\346\217\220\347\244\272\347\252\227\345\234\250\346\234\200\345\272\225\344\270\213\222\230",
tLen=6, status=0x7ffde80dcab8) at ucol.cpp:8104
#2 0x00007f7926e1487a in ucol_strcollUTF8_52 (coll=0x1b422c0,
source=0x2b56b5c
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\220",
sourceLength=7,
target=0x2b5735c
"wuiredntnsookiemmand-tcl-testsuit-tp65374p65375\204\346\226\231\344\271\237\345\276\210\345\244\232\343\200\202\257debian\345\222\214redhat\344\272\206\357\274\214\344\270\244\344\270\252\215\263\345\264\251\346\272\203\343\200\202\273\345\205\263\351\227\255\347\232\204\351\202\243\344\270\252\346\217\220\347\244\272\347\252\227\345\234\250\346\234\200\345\272\225\344\270\213\222\230",
targetLength=6, status=0x7ffde80dcab8) at ucol.cpp:8759
#3 0x00000000007cc7b4 in varstrfastcmp_locale (x=45443928, y=45445976,
ssup=<optimized out>) at varlena.c:2139
#4 0x00000000008170b6 in ApplySortComparator (ssup=0x1b344a8,
isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimized out>,
datum1=<optimized out>) at
../../../../src/include/utils/sortsupport.h:225
#5 comparetup_datum (a=0x7ffde80dcb90, b=0x1b36218, state=0x1b34298)
at tuplesort.c:4341
#6 0x00000000008154d9 in tuplesort_heap_replace_top (
state=state(at)entry=0x1b34298, tuple=tuple(at)entry=0x7ffde80dcb90,
checkIndex=checkIndex(at)entry=0 '\000') at tuplesort.c:3512
...cut...

query #2: select count(distinct wordtext collate
"bs-Cyrl-BA-u-co-search-x-icu")
from words_test;

The backend terminates with SIGSEGV leaving a 17GB core file
(there's definitely a uint32 offset overflow from 2^31->0
in ucol_CEBuf_Expand just before the crash in memcpy)

#0 __memcpy_sse2_unaligned ()
at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
#1 0x00007fbad6951ec3 in ucol_CEBuf_Expand (b=0x7ffdef2d2390,
ci=0x7ffdef2d3560, status=0x7ffdef2d41e8) at ucol.cpp:6998
#2 0x00007fbad6951f68 in UCOL_CEBUF_PUT (b=0x7ffdef2d2390, ce=1493173509,
ci=0x7ffdef2d3560, status=0x7ffdef2d41e8) at ucol.cpp:7011
#3 0x00007fbad69525c7 in ucol_strcollRegular (sColl=0x7ffdef2d3560,
tColl=0x7ffdef2d3710, status=0x7ffdef2d41e8) at ucol.cpp:7152
#4 0x00007fbad6954080 in ucol_strcollRegularUTF8 (coll=0xf75610,
source=0x23318a9 "\354\275\227\345\226\251zx\343\216\215",
sourceLength=11, target=0x2331939 "\354\263\255\345\226\251zz",
targetLength=8, status=0x7ffdef2d41e8) at ucol.cpp:8012
#5 0x00007fbad6956845 in ucol_strcollUTF8_52 (coll=0xf75610,
source=0x23318a9 "\354\275\227\345\226\251zx\343\216\215",
sourceLength=11, target=0x2331939 "\354\263\255\345\226\251zz",
targetLength=8, status=0x7ffdef2d41e8) at ucol.cpp:8757
#6 0x00000000007cc7b4 in varstrfastcmp_locale (x=36903080, y=36903224,
ssup=<optimized out>) at varlena.c:2139
#7 0x0000000000817138 in ApplySortAbbrevFullComparator (ssup=0xf67888,
isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimized out>,
datum1=<optimized out>) at
../../../../src/include/utils/sortsupport.h:263
#8 comparetup_datum (a=0x7fbac1cd6210, b=0x7fbac1cd6228,
state=<optimized out>) at tuplesort.c:4350
#9 0x0000000000815aef in qsort_tuple (a=0x7fbac1cd6210, n=<optimized out>,
n(at)entry=2505, cmp_tuple=cmp_tuple(at)entry=0x817030 <comparetup_datum>,
state=state(at)entry=0xf67678) at qsort_tuple.c:104
#10 0x0000000000815c1f in qsort_tuple (a=0x7fbac1cbc020, n=<optimized out>,
n(at)entry=5279, cmp_tuple=cmp_tuple(at)entry=0x817030 <comparetup_datum>,
state=state(at)entry=0xf67678) at qsort_tuple.c:191
...cut...

Next I've tried with valgrind, invoked with:

valgrind \
--suppressions=$HOME/src/postgresql-10beta2/src/tools/valgrind.supp \
--trace-children=yes --track-origins=yes --error-limit=no \
--read-var-info=yes --log-file=/tmp/log-valgrind \
bin/postgres -D $PWD/data

(and turning off autovacuum because it causes some unrelated reports).

With query #1 the run goes to completion without crashing (hi heisenbug!),
but it reports many uses of uninitialised values under ucol_strcollUTF8().
The full log is attached in log-valgrind-1.txt.gz

With query #2 it ends up crashing after ~5hours and produces
the log in log-valgrind-2.txt.gz with some other entries than
case #1, but AFAICS still all about reading uninitialised values
in space allocated by datumCopy().

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Attachment Content-Type Size
log-valgrind-2.txt.gz application/octet-stream 15.3 KB
log-valgrind-1.txt.gz application/octet-stream 13.0 KB

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2017-08-03 15:52:10 Re: BUG #14770: postgres_fdw assumes foreign table is in postgres
Previous Message Tom Lane 2017-08-03 15:41:51 Re: "could not open relation with OID XXX" when using recreated index in sql function

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-08-03 15:51:43 Re: On Complex Source Code Reading Strategy
Previous Message Alvaro Herrera 2017-08-03 15:45:34 Re: Macros bundling RELKIND_* conditions