Skip site navigation (1) Skip section navigation (2)

Full text search - altering the default parser

From: Richard Huxton <dev(at)archonet(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Full text search - altering the default parser
Date: 2008-02-27 11:30:57
Message-ID: 47C549F1.2070703@archonet.com (view raw or flat)
Thread:
Lists: pgsql-hackers
The default parser doesn't allow commas in numbers (I can see why, I think).

SELECT ts_parse('default', '123,000');
  ts_parse
----------
  (22,123)
  (12,",")
  (22,000)

One option of course is to pre-process the text, but since we can 
support custom parsers I thought I'd take a look at the code to teach it 
some flexibility on numbers. I'm guessing this would be of interest to 
anyone wanting to support European-style "," decimal indicators too.

My "C" is horribly rusty, so can I check I've got this right? Before I 
start exploring compiler errors I've not seen for decades ;-)


The parser functions (prsd_xxx) are all defined in 
backend/tsearch/wparser_def.c

The state machine is driven through the TParserStateActionItem 
definitions on lines 644 - 1263. Changing one of these will change the 
definition of the corresponding token-type.

To add a new token-type, I'd add it to the various lists line 30-194, 
then add the relevant TParserStateActionItems.

Thanks

-- 
   Richard Huxton
   Archonet Ltd

pgsql-hackers by date

Next:From: Magnus HaganderDate: 2008-02-27 11:47:15
Subject: Re: OSSP can be used in the windows environment now!
Previous:From: Simon RiggsDate: 2008-02-27 10:47:29
Subject: Re: An idea for parallelizing COPY within one backend

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group