Re: BUG #8970: ts_parse incorrectly split numbers in digit token

From: Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com>
To: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8970: ts_parse incorrectly split numbers in digit token
Date: 2014-02-01 20:16:39
Message-ID: 52ED5627.4070005@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 26/01/2014 18:27, Tom Lane wrote:
> Marco Atzeri <marco(dot)atzeri(at)gmail(dot)com> writes:
>> On 26/01/2014 03:25, Alvaro Herrera wrote:
>>> To trace this, I would look at src/backend/tsearch/wparser_def.c;
>>> probably try compiling that file with WPARSER_TRACE defined, and compare
>>> the output of ts_parse() in something simple such as '345' in a working
>>> port with the failing one. That might give you clues as to what is
>>> causing the failure.
>
>> [ trace ]
>
> As was suspected upthread, this shows that p_isdigit() is failing to
> recognize "3" as a digit. So you've got broken locale support somewhere.
>
> There are two different implementations of p_isdigit in wparser_def.c,
> depending on whether USE_WIDE_UPPER_LOWER is defined. It should be, in
> a Windows build, but maybe this is tracing back to a configure problem?
>
> regards, tom lane
>

debugging a bit I think that is not a broken locale

the first two times the character contains also a portion of the
next digit so the result is always false.

Eventually it was assumed that size of a wide char is always 32 bit ?

"Unlike Windows UTF-16 2-byte wide chars, wchar_t on Linux and OS X is 4
bytes UTF-32 (gcc/g++ and XCode). On cygwin it is 2 (cygwin uses Windows
APIs)."

testing with "SELECT * FROM ts_parse('default', '345');"

--------------------------------------------------------------
Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3407923)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$77 = 0x340033
(gdb) finish
Run till exit from #0 iswdigit (c=3407923)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $78 = 0

Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=3473460)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$79 = 0x350034
(gdb) finish
Run till exit from #0 iswdigit (c=3473460)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $80 = 0

Breakpoint 1, p_isdigit (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:560
560 p_iswhat(digit)
(gdb) step
0x007036d8 in iswdigit ()
(gdb) step
Single stepping until exit from function iswdigit,
which has no line number information.
iswdigit (c=53)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
35 return (c >= (wint_t)'0' && c <= (wint_t)'9');
(gdb) p/x c
$81 = 0x35
(gdb) finish
Run till exit from #0 iswdigit (c=53)
at /usr/src/debug/cygwin-1.7.27-2/newlib/libc/ctype/iswdigit.c:35
0x0060c510 in TParserGet (prs=0x80100930)
at
/pub/devel/postgresql/postgresql-9.3.2-2/src/postgresql-9.3.2/src/backend/tsearch/wparser_def.c:1834
1834 if (item->isclass(prs) != 0)
Value returned is $82 = 1
-------------------------------------------------------------------------

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2014-02-01 22:27:11 Re: BUG #8970: ts_parse incorrectly split numbers in digit token
Previous Message Paul Watson 2014-02-01 20:14:21