Skip site navigation (1) Skip section navigation (2)

Re: Weird problem concerning tsearch functions built into postgres 8.3, assistance requested

From: Teodor Sigaev <teodor(at)sigaev(dot)ru>
To: cheighlund(at)yahoo(dot)com
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Weird problem concerning tsearch functions built into postgres 8.3, assistance requested
Date: 2008-10-30 13:37:40
Message-ID: 4909B8A4.4050706@sigaev.ru (view raw or flat)
Thread:
Lists: pgsql-general
> One of the tables we're using in the 8.1.3 setups currently running 
> includes phone numbers as a searchable field (fti_phone), with the 
> results of a select on the field generally looking like this: 'MMM':2 
> 'NNNN':3 'MMM-NNNN':1.  MMM is the first three digits, NNNN is the 
> fourth-seventh.
> 
> The weird part is this: On the old systems running 8.1.3, I can look up 
> a record by
> fti_phone using any of the three above items; first three, last four, or 
> entire number including dash.  On the new system running 8.3.1, I can do 
> a lookup by the first three or the last four and get the results I'm 
> after, but any attempt to do a lookup by the entire MMM-NNNN version 
> returns no records.

Parser was changed:
postgres=# select  * from ts_debug('123-4567');
  alias |   description    | token | dictionaries | dictionary | lexemes
-------+------------------+-------+--------------+------------+---------
  uint  | Unsigned integer | 123   | {simple}     | simple     | {123}
  int   | Signed integer   | -4567 | {simple}     | simple     | {-4567}
(2 rows)
postgres=# select  * from ts_debug('abc-defj');
       alias      |           description           |  token   |  dictionaries 
|  dictionary  |  lexemes
-----------------+---------------------------------+----------+----------------+--------------+------------
  asciihword      | Hyphenated word, all ASCII      | abc-defj | {english_stem} 
| english_stem | {abc-defj}
  hword_asciipart | Hyphenated word part, all ASCII | abc      | {english_stem} 
| english_stem | {abc}
  blank           | Space symbols                   | -        | {} 
|              |
  hword_asciipart | Hyphenated word part, all ASCII | defj     | {english_stem} 
| english_stem | {defj}

Parser in 8.1 threats any [alnum]+-[alnum]+ as a hyphenated word, but 8.3 treats 
[digit]+-[digit]+ as two separated numbers.

So, you can play around pre-process texts before indexing or have a look on
regex dictionary (http://vo.astronet.ru/arxiv/dict_regex.html)
-- 
Teodor Sigaev                                   E-mail: teodor(at)sigaev(dot)ru
                                                    WWW: http://www.sigaev.ru/

In response to

pgsql-general by date

Next:From: Thomas GuettlerDate: 2008-10-30 13:37:43
Subject: Re: Schema Upgrade Howto
Previous:From: Igor NeymanDate: 2008-10-30 13:17:00
Subject: excluding tables from VACUUM ANALYZE

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group