Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-07-31 17:37:10
Message-ID: 20170731173710.GA19829@marmot
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> SELECT count(distinct wordtext COLLATE :"collname") FROM words_test;
>
>Some of the collations that crash:
> az-Latn-AZ-u-co-search-x-icu
> bs-Latn-BA-u-co-search-x-icu
> bs-x-icu
> cs-CZ-u-co-search-x-icu
> de-BE-u-co-phonebk-x-icu
> sr-Latn-XK-x-icu
> zh-Hans-CN-u-co-big5han-x-icu
>
>Trying all of them I had 146 crashes out of the 1741 ICU
>entries in pg_collation created by initdb.
>
>The size of the table is 291MB, and work_mem to 128MB.
>
>Reducing the dataset tends to make the problem disappear: if I split
>the table in halves based on row_number() to bisect on the data,
>the queries on both parts pass without crashing.

I think that this sensitivity to work_mem exists because abbreviated
keys are used for quicksort operations that sort individual runs.
As work_mem is increased, and less merging is required, affected
codepaths are reached less frequently. You would probably find that the
problem appears more consistently if varstr_sortsupport() is modified so
that even ICU collations never use abbreviated keys; that would be a
matter of "abbreviate" always being set to false within that function.

I suggest using the new amcheck contrib module as part of this testing
(you'll need to use CREATE INDEX to have an index to perform
verification against). This will zero in on inconsistencies that may be
far more subtle than a hard crash. I wouldn't assume that abbreviated
key comparisons are correct here just because there is no hard crash.

Does the crash always have ucol_strcollUseLatin1UTF8() in its backtrace?

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message David G. Johnston 2017-07-31 21:41:48 Re: BUG #14759: insert into foreign data partitions fail
Previous Message Daniel Verite 2017-07-31 16:21:44 Crash report for some ICU-52 (debian8) COLLATE and work_mem values

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-07-31 17:53:13 Re: PL_stashcache, or, what's our minimum Perl version?
Previous Message Robert Haas 2017-07-31 17:28:12 Re: LP_DEAD hinting and not holding on to a buffer pin on leaf page (Was: [WIP] Zipfian distribution in pgbench)