Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values

From: Peter Geoghegan <pg(at)bowt(dot)ie>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: PostgreSQL mailing lists <pgsql-bugs(at)postgresql(dot)org>, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Subject: Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Date: 2017-08-01 20:42:33
Message-ID: CAH2-WzktoNj4uBhJq+5y9puLRq7bHuK=7S+MQKcbgnG4M6A9cg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Tue, Aug 1, 2017 at 12:45 AM, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
> The test that iterates over collations produces two kinds of core files,
> some of them are 289MB large, some others are 17GB large.
> shared_buffers is only 128MB and work_mem 128MB,
> so 289MB is not surprising but 17GB seems excessive.
> The box has 16GB of physical mem and 8GB of swap.
>
> I haven't checked all core files because they exhaust the disk
> space before completion of the test, but a typical backtrace for
> the biggest ones looks like the following, with the segfaults
> happening in memcpy:
>
> Program terminated with signal SIGSEGV, Segmentation fault.
> #0 __memcpy_sse2_unaligned ()
> at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
> (gdb) #0 __memcpy_sse2_unaligned ()
> at ../sysdeps/x86_64/multiarch/memcpy-sse2-unaligned.S:35
> #1 0x00007fc1db02be6b in memcpy (__len=8589934592, __src=0x7fbc6f4a2010,
> __dest=<optimized out>) at
> /usr/include/x86_64-linux-gnu/bits/string3.h:51
> #2 ucol_CEBuf_Expand (ci=<optimized out>, status=0x7ffd90777128,
> b=0x7ffd907751e0) at ucol.cpp:7009
> #3 UCOL_CEBUF_PUT (status=0x7ffd90777128, ci=0x7ffd90776460, ce=1493173509,
> b=0x7ffd907751e0) at ucol.cpp:7022
> #4 ucol_strcollRegular (sColl=sColl(at)entry=0x7ffd90776460,
> tColl=tColl(at)entry=0x7ffd90776610, status=status(at)entry=0x7ffd90777128)
> at ucol.cpp:7163
> #5 0x00007fc1db031177 in ucol_strcollRegularUTF8 (coll=0x1371af0,
> source=source(at)entry=0x273d379 "콗喩zx㎍",
> sourceLength=sourceLength(at)entry=11, target=<optimized out>,
> targetLength=targetLength(at)entry=8, status=status(at)entry=0x7ffd90777128)
> at ucol.cpp:8023
> #6 0x00007fc1db032d36 in ucol_strcollUseLatin1UTF8 (status=<optimized out>,
> tLen=<optimized out>, target=<optimized out>, sLen=<optimized out>,
> source=<optimized out>, coll=<optimized out>) at ucol.cpp:8108
> #7 ucol_strcollUTF8_52 (coll=<optimized out>,
> source=source(at)entry=0x273d379 "콗喩zx㎍", sourceLength=<optimized out>,
> sourceLength(at)entry=11, target=<optimized out>,
> target(at)entry=0x273d409 "쳭喩zz", targetLength=targetLength(at)entry=8,
> status=status(at)entry=0x7ffd90777128) at ucol.cpp:8770

Interesting. The "__len" argument to memcpy() is 8589934592 -- that's
2 ^ 33. (I'm not sure why it's the first memcpy() argument in the
stack trace, since it's supposed to be the last -- anyone seen that
before?)

Can you figure out what the optimized-out lengths are, by either
looking at registers within GDB, or building at a lower optimization
level?

Maybe this is a bug in ICU-52. For reference, here is ICU-52's
ucol_CEBuf_Expand() function:

static
void ucol_CEBuf_Expand(ucol_CEBuf *b, collIterate *ci, UErrorCode *status) {
uint32_t oldSize;
uint32_t newSize;
uint32_t *newBuf;

ci->flags |= UCOL_ITER_ALLOCATED;
oldSize = (uint32_t)(b->pos - b->buf);
newSize = oldSize * 2;
newBuf = (uint32_t *)uprv_malloc(newSize * sizeof(uint32_t));
if(newBuf == NULL) {
*status = U_MEMORY_ALLOCATION_ERROR;
}
else {
uprv_memcpy(newBuf, b->buf, oldSize * sizeof(uint32_t));
if (b->buf != b->localArray) {
uprv_free(b->buf);
}
b->buf = newBuf;
b->endp = b->buf + newSize;
b->pos = b->buf + oldSize;
}
}

If "oldSize * sizeof(uint32_t)" becomes what we see as "__len", as I
believe it does, then that must mean that oldSize is 2 ^ 31. *Not* 2 ^
31 - 1 (INT_MAX). I think that this could be an off-by-one bug, since
ucol_strcollUTF8()/ucol_strcollUTF8_52() accepts an int32 argument for
sourceLength and targetLength. I'm not very confident of this, but it
does make a certain amount of sense. It could be that everyone else is
passing -1 as sourceLength and targetLength arguments, anyway, to
indicate that the buffer is NUL-terminated, as required by regular
strcoll().

Note also that the docs say this of ucol_strcollUTF8(): "When input
string contains malformed a UTF-8 byte sequence, this function treats
these bytes as REPLACEMENT CHARACTER (U+FFFD)". I'm not sure that
that's a very sensible way for it to fail.

I'd be interested to see if anything changed when -1 was passed as
both sourceLength and targetLength to ucol_strcollUTF8(). You'd have
to build Postgres yourself to test this, but it would just work, since
we don't actually avoid NUL termination, even though in principled we
could with ICU.

--
Peter Geoghegan

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2017-08-01 22:52:32 Re: BUG #14758: Segfault with logical replication on a function index
Previous Message Peter Eisentraut 2017-08-01 20:29:40 Re: [HACKERS] Re: BUG #14758: Segfault with logical replication on a function index

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2017-08-01 22:17:51 Re: More flexible LDAP auth search filters?
Previous Message Peter Eisentraut 2017-08-01 20:29:40 Re: [HACKERS] Re: BUG #14758: Segfault with logical replication on a function index