Skip site navigation (1) Skip section navigation (2)

Re: Extending range of to_tsvector et al

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: john knightley <john(dot)knightley(at)gmail(dot)com>
Cc: Dan Scott <denials(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extending range of to_tsvector et al
Date: 2012-10-01 04:11:18
Message-ID: 28864.1349064678@sss.pgh.pa.us (view raw or flat)
Thread:
Lists: pgsql-hackers
john knightley <john(dot)knightley(at)gmail(dot)com> writes:
> The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on
> a utf8 local

> A short 5 line dictionary file  is sufficient to test:-

> raeuz
> 
> 
> 
> 

> line 1 "raeuz" Zhuang word written using English letters and show up
> under ts_vector ok
> line 2 "" uses everyday Chinese word and show up under ts_vector ok
> line 3 "" Zhuang word written using rather old Chinese charcters
> found in Unicode 3.1 which came in about the year 2000  and show up
> under ts_vector ok
> line 4 "" Zhuang word written using rather old Chinese charcters
> found in Unicode 5.2 which came in about the year 2009 but do not show
> up under ts_vector ok
> line 5 "" Zhuang word written using rather old Chinese charcters
> found in PUA area of the font Sawndip.ttf but do not show up under
> ts_vector ok (Font can be downloaded from
> http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf)

AFAIK there is nothing in Postgres itself that would distinguish, say,
 from .  I think this must be down to
your platform's locale definition: it probably thinks that the former is
a letter and the latter is not.  You'd have to gripe to the locale
maintainers to get that fixed.

			regards, tom lane


In response to

Responses

pgsql-hackers by date

Next:From: john knightleyDate: 2012-10-01 04:35:04
Subject: Re: Extending range of to_tsvector et al
Previous:From: Dan ScottDate: 2012-10-01 03:58:11
Subject: Re: Extending range of to_tsvector et al

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group