Re: Request for review: tsearch2 patch

From: Tatsuo Ishii <ishii(at)sraoss(dot)co(dot)jp>
To: teodor(at)sigaev(dot)ru
Cc: ishii(at)postgresql(dot)org, pgsql-hackers(at)postgresql(dot)org, oleg(at)sai(dot)msu(dot)su
Subject: Re: Request for review: tsearch2 patch
Date: 2007-01-11 01:36:25
Message-ID: 20070111.103625.15249140.t-ishii@sraoss.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: [HACKERS] Request for review: tsearch2 patch
Date: Wed, 10 Jan 2007 18:50:44 +0300
Message-ID: <45A50B54(dot)6090608(at)sigaev(dot)ru>

> > I have tested with local-enabled environment and found a bug. Included
> > is the new version of patches.
> Your patch causes crash on tsearch2's installcheck with 'initdb -E UTF8 --locale
> C', simple way to reproduce:
> # select to_tsquery('default', '''New York''');
> server closed the connection unexpectedly
> This probably means the server terminated abnormally
> before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.

It seems it's a bug with original tsearch2. Here is the patches.

------------------------------------------------------------------
*** wordparser/parser.c~ 2007-01-07 09:54:39.000000000 +0900
--- wordparser/parser.c 2007-01-11 10:33:41.000000000 +0900
***************
*** 51,57 ****
if (prs->charmaxlen > 1)
{
prs->usewide = true;
! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * prs->lenstr);
prs->lenwstr = char2wchar(prs->wstr, prs->str, prs->lenstr);
}
else
--- 51,57 ----
if (prs->charmaxlen > 1)
{
prs->usewide = true;
! prs->wstr = (wchar_t *) palloc(sizeof(wchar_t) * (prs->lenstr+1));
prs->lenwstr = char2wchar(prs->wstr, prs->str, prs->lenstr);
}
else
------------------------------------------------------------------

> >> ! static int p_isalnum(TParser *prs) {
> ...
> >> ! if (lc_ctype_is_c())
> >> ! {
> >> ! if (c > 0x7f)
> >> ! return 1;
>
> I have some some doubts that any character greater than 0x7f is an alpha symbol.
> Is it simple assumption or workaround?

Yeah, it's a workaround. Since there's no concept other than
alpha/numeric/latin in tsearch2, Asian characters have to be fall in
one of them.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2007-01-11 01:37:15 Re: Request for review: tsearch2 patch
Previous Message Richard Troy 2007-01-11 01:35:41 Re: ideas for auto-processing patches