Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: "Daniel Verite" <daniel(at)manitou-mail(dot)org>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-07-31 16:21:44
Message-ID: f1438ec6-22aa-4029-9a3b-26f79d330e72@manitou-mail.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Hi,

With 10beta2 built on Debian 8 with
./configure --enable-debug --with-icu
and the ICU package currently in the "jessie" Debian repo:

$ dpkg -l 'libicu*'
...
ii libicu-dev:amd 52.1-8+deb8u amd64 Development files for
Internation
ii libicu52:amd64 52.1-8+deb8u amd64 International Components for
Unic
ii libicu52-dbg 52.1-8+deb8u amd64 International Components for
Unic

I've got a table with 6,6 million unique small bits of text from different
Unicode alphabets:

Table "public.words_test"
Column | Type | Collation | Nullable | Default
----------+------+-----------+----------+---------
wordtext | text | | |

and found that running the following query on it consistently
provokes a SIGSEGV with certain collations:

SELECT count(distinct wordtext COLLATE :"collname") FROM words_test;

Some of the collations that crash:
az-Latn-AZ-u-co-search-x-icu
bs-Latn-BA-u-co-search-x-icu
bs-x-icu
cs-CZ-u-co-search-x-icu
de-BE-u-co-phonebk-x-icu
sr-Latn-XK-x-icu
zh-Hans-CN-u-co-big5han-x-icu

Trying all of them I had 146 crashes out of the 1741 ICU
entries in pg_collation created by initdb.

The size of the table is 291MB, and work_mem to 128MB.

Reducing the dataset tends to make the problem disappear: if I split
the table in halves based on row_number() to bisect on the data,
the queries on both parts pass without crashing.

Below is a backtrace got with collate "az-Latn-AZ-u-co-search-x-icu",
and work_mem to 128MB. Cranking up work_mem to 512MB makes
the crash not happen, but 300MB is not enough.
(by comparison, the same query with collate "en_US.utf8" or "fr-x-icu"
runs fine with work_mem to 4MB)

#0 0x00007fc3c5017030 in ucol_getLatinOneContractionUTF8 (
coll=coll(at)entry=0x2824ef0, strength=strength(at)entry=0, CE=<optimized out>,
s=s(at)entry=0x383a3ec
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\200",
index=index(at)entry=0x7fff46afe290, len=len(at)entry=7) at ucol.cpp:8044
#1 0x00007fc3c502917c in ucol_strcollUseLatin1UTF8 (status=0x7fff46afe338,
tLen=7,
target=0x383a3ec
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\200",
sLen=6,
source=0x3839bec
"wuiredntnsookiemmand-tcl-testsuit-tp65374p65375\204\346\226\231\344\271\237\345\276\210\345\244\232\343\200\202\257debian\345\222\214redhat\344\272\206\357\274\214\344\270\244\344\270\252\215\263\345\264\251\346\272\203\343\200\202\226\350\257\221\343\200\202\204u\347\233\230\357\274\214\272\346\211\213\344\272\206\344\270\200\344\272\233\343\200\202ct\342\224\200\342\224\230",
coll=0x2824ef0) at ucol.cpp:8153
#2 ucol_strcollUTF8_52 (coll=<optimized out>,
source=0x3839bec
"wuiredntnsookiemmand-tcl-testsuit-tp65374p65375\204\346\226\231\344\271\237\345\276\210\345\244\232\343\200\202\257debian\345\222\214redhat\344\272\206\357\274\214\344\270\244\344\270\252\215\263\345\264\251\346\272\203\343\200\202\226\350\257\221\343\200\202\204u\347\233\230\357\274\214\272\346\211\213\344\272\206\344\270\200\344\272\233\343\200\202ct\342\224\200\342\224\230",
source(at)entry=0x3839be9
"reqwuiredntnsookiemmand-tcl-testsuit-tp65374p65375\204\346\226\231\344\271\237\345\276\210\345\244\232\343\200\202\257debian\345\222\214redhat\344\272\206\357\274\214\344\270\244\344\270\252\215\263\345\264\251\346\272\203\343\200\202\226\350\257\221\343\200\202\204u\347\233\230\357\274\214\272\346\211\213\344\272\206\344\270\200\344\272\233\343\200\202ct\342\224\200\342\224\230",
sourceLength=<optimized out>, sourceLength(at)entry=9,
target=0x383a3ec
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\200",
target(at)entry=0x383a3e9
"requ\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\200",
targetLength=7, targetLength(at)entry=10,
status=status(at)entry=0x7fff46afe338) at ucol.cpp:8770
#3 0x00000000007cc7b4 in varstrfastcmp_locale (x=58956776, y=58958824,
ssup=<optimized out>) at varlena.c:2139
#4 0x00000000008170b6 in ApplySortComparator (ssup=0x28171b8,
isNull2=<optimized out>, datum2=<optimized out>, isNull1=<optimized out>,
datum1=<optimized out>) at
../../../../src/include/utils/sortsupport.h:225
#5 comparetup_datum (a=0x2818ef8, b=0x2818f10, state=0x2816fa8)
at tuplesort.c:4341
#6 0x0000000000815623 in tuplesort_heap_replace_top (
state=state(at)entry=0x2816fa8, tuple=tuple(at)entry=0x7fff46afe410,
checkIndex=checkIndex(at)entry=0 '\000') at tuplesort.c:3510
#7 0x0000000000816d8c in tuplesort_gettuple_common (
state=state(at)entry=0x2816fa8, forward=forward(at)entry=1 '\001',
stup=stup(at)entry=0x7fff46afe460) at tuplesort.c:2082
#8 0x000000000081b176 in tuplesort_getdatum (state=0x2816fa8,
forward=forward(at)entry=1 '\001', val=val(at)entry=0x28130f0,
isNull=isNull(at)entry=0x2813409 "", abbrev=abbrev(at)entry=0x7fff46afe518)
at tuplesort.c:2205
#9 0x00000000005ea107 in process_ordered_aggregate_single (
pergroupstate=0x2812f38, pertrans=0x2812f98, aggstate=0x2811198)
at nodeAgg.c:1330
#10 finalize_aggregates (aggstate=aggstate(at)entry=0x2811198,
peraggs=peraggs(at)entry=0x2812588, pergroup=<optimized out>)
at nodeAgg.c:1736
#11 0x00000000005eaabd in agg_retrieve_direct (aggstate=0x2811198)
at nodeAgg.c:2464
#12 ExecAgg (node=node(at)entry=0x2811198) at nodeAgg.c:2117
#13 0x00000000005e2378 in ExecProcNode (node=node(at)entry=0x2811198)
at execProcnode.c:539
#14 0x00000000005ddf1e in ExecutePlan (execute_once=<optimized out>,
dest=0x27f7eb0, direction=<optimized out>, numberTuples=0,
sendTuples=<optimized out>, operation=CMD_SELECT,
use_parallel_mode=<optimized out>, planstate=0x2811198, estate=0x2810f88)
at execMain.c:1693
#15 standard_ExecutorRun (queryDesc=0x280d6d8, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:362
#16 0x00000000006f924c in PortalRunSelect (portal=portal(at)entry=0x280ef78,
forward=forward(at)entry=1 '\001', count=0, count(at)entry=9223372036854775807,
dest=dest(at)entry=0x27f7eb0) at pquery.c:932
#17 0x00000000006fa5f0 in PortalRun (portal=0x280ef78,
count=9223372036854775807, isTopLevel=<optimized out>,
run_once=<optimized out>, dest=0x27f7eb0, altdest=0x27f7eb0,
completionTag=0x7fff46afe830 "") at pquery.c:773
#18 0x00000000006f67c3 in exec_simple_query (
query_string=0x383a3ec
"u\364\217\273\252tectuablesow-what-it-is-about_tlsstnamek\224\257\346\214\201gb2312\357\274\214\202\232\204\344\271\237\351\201\207\345\210\260\350\277\207\344\270\200\344\272\233\345\205\266\345\256\203\351\227\256\351\242\230\357\274\214\345\233\240\344\270\272\346\262\241\346\234\211\350\277\231\344\270\252\351\227\256\351\242\230\344\270\245\351\207\215\357\274\214\350\277\230\347\256\227\344\270\215\351\224\231\343\200\202\200\342\224\200")
at postgres.c:1099
#19 0x00000000006f842a in PostgresMain (argc=1, argv=0x27d5f68,
dbname=0x2752288 "mlists", username=0x276a298 "postgres")
at postgres.c:4090
#20 0x000000000047803f in BackendRun (port=0x274bf30) at postmaster.c:4357
#21 BackendStartup (port=0x274bf30) at postmaster.c:4029
#22 ServerLoop () at postmaster.c:1753
#23 0x0000000000692a82 in PostmasterMain (argc=argc(at)entry=3,
argv=argv(at)entry=0x2724330) at postmaster.c:1361
#24 0x0000000000478f8e in main (argc=3, argv=0x2724330) at main.c:228

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2017-07-31 17:37:10 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Michael Paquier 2017-07-31 07:52:40 Re: signal 11 segfaults with parallel workers

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-07-31 16:40:29 Re: Transactions involving multiple postgres foreign servers
Previous Message Robert Haas 2017-07-31 16:13:17 Re: Clarification in pg10's pgupgrade.html step 10 (upgrading standby servers)