Quick Links

Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	kenji uno <h8mastre(at)gmail(dot)com>
Cc:	pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject:	Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters
Date:	2018-11-03 17:03:20
Message-ID:	3873.1541264600@sss.pgh.pa.us
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-bugs

kenji uno <h8mastre(at)gmail(dot)com> writes:
>> I failed to reproduce this on a Linux machine. It looks to me like the
>> problem is that Windows' MultiByteToWideChar doesn't think that UTF8
>> character is valid.

> I'm just wondering why my issue occurs only on Windows.
> But I knew why: char2wchar's tolen requires +1 output buffer size, due to
> null-termination.

Oooh ... the problem, effectively, is that the ts_locale.c functions are
expecting to get back UTF32 but what they'll actually get on Windows is
UTF16. So if the given character is outside the BMP range, char2wchar
needs to produce a surrogate pair, which there's not room for given that
the output buffer can only hold 1 wchar_t plus trailing null.

Then the other problem is that the Windows-Unicode code path in char2wchar
just fails for an undersized output buffer, which you would not expect
from its documentation. And it fails with a misleading error message,
too.

I'll see what I can do about this --- thanks for the report!

regards, tom lane

In response to

Re: BUG #15476: Problem on show_trgm with 4 byte UTF-8 characters at 2018-11-03 04:44:30 from Tom Lane

Browse pgsql-bugs by date

	From	Date	Subject
Next Message	pinker	2018-11-03 18:01:43	Re: BUG #15231: After Upgrade from 9.3.23 to 9.6.9 getting ERROR: found xmin 598 from before relfrozenxid 68569164
Previous Message	Daniel Verite	2018-11-03 11:11:53	Re: Unable to copy large (>2GB) files using PostgreSQL 11 (Windows)