Re: BUG #3730: Creating a swedish dictionary fails

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: penty(dot)wenngren(at)dgc(dot)se, pgsql-bugs(at)postgresql(dot)org, Oleg Bartunov <oleg(at)sai(dot)msu(dot)su>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: BUG #3730: Creating a swedish dictionary fails
Date: 2007-11-09 19:10:15
Message-ID: 20071109191015.GC7161@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Tom Lane wrote:
> Alvaro Herrera <alvherre(at)commandprompt(dot)com> writes:
> > I am wondering if the newline being included in the token could be
> > causing a problem.
>
> Nope. I traced through it and the problem is that char2wchar() is
> completely brain-dead: at some places it thinks that "len" is the
> length of the output wchar array, and at others it thinks that "len"
> is the number of bytes in the input. In particular, _t_isalpha()
> fails completely for any multibyte character, because the pnstrdup
> call truncates the character to 1 byte.

Ah, that explains it. I was reading that code too and did not
understand what was going on.

> After looking at the callers I'm inclined to think that the only
> safe way to implement this routine is to change its API to provide
> both counts. Comments?

+1

--
Alvaro Herrera http://www.flickr.com/photos/alvherre/
Licensee shall have no right to use the Licensed Software
for productive or commercial use. (Licencia de StarOffice 6.0 beta)

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Alvaro Herrera 2007-11-09 19:12:57 Re: BUG #3730: Creating a swedish dictionary fails
Previous Message Tom Lane 2007-11-09 18:49:27 Re: BUG #3730: Creating a swedish dictionary fails