Quick Links

Re: Extending range of to_tsvector et al

From:	Dan Scott <denials(at)gmail(dot)com>
To:	johnkn63 <john(dot)knightley(at)gmail(dot)com>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Extending range of to_tsvector et al
Date:	2012-10-01 03:04:24
Message-ID:	CAAY5AM3pFkc=HNHbpGn_xf3wE+tcRaa4jwyG7mRC89z6mxsxZQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Sep 30, 2012 at 1:56 PM, johnkn63 <john(dot)knightley(at)gmail(dot)com> wrote:
> When using to_tsvector a number of newer unicode characters and pua
> characters are not included. How do I add the characters which I desire to
> be found?

I've just started digging into this code a bit, but from what I've
found src/backend/tsearch/wparser_def.c defines much of the parser
functionality, and in the area of Unicode includes a number of
comments like:

* with multibyte encoding and C-locale isw* function may fail or give
wrong result.
* multibyte encoding and C-locale often are used for Asian languages.
* any non-ascii symbol with multibyte encoding with C-locale is an
alpha character

... in concert with ifdefs around WIDE_UPPER_LOWER (in effect if
WCSTOMBS and TOWLOWER are available) to complicate testing scenarios
:)

Also note that src/test/regress/sql/tsearch.sql and
regress/sql/tsdicts.sql currently focus on English, ASCII-only data.

Perhaps this is a good opportunity for you to describe what your
environment looks like (OS, PostgreSQL version, encoding and locale
settings for the database) and show some sample to_tsquery() @@
to_tsvector() queries that don't behave the way you think they should
behave - and we could start building some test cases as a first step?

--
Dan Scott
Laurentian University

In response to

Extending range of to_tsvector et al at 2012-09-30 17:56:10 from johnkn63

Responses

Re: Extending range of to_tsvector et al at 2012-10-01 03:45:05 from john knightley

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Peter Eisentraut	2012-10-01 03:30:00	Re: Unportable use of uname in pg_upgrade test script
Previous Message	Amit kapila	2012-10-01 02:25:17	Re: Switching timeline over streaming replication