From: | Richard Huxton <dev(at)archonet(dot)com> |
---|---|
To: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Full text search - altering the default parser |
Date: | 2008-02-27 11:30:57 |
Message-ID: | 47C549F1.2070703@archonet.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
The default parser doesn't allow commas in numbers (I can see why, I think).
SELECT ts_parse('default', '123,000');
ts_parse
----------
(22,123)
(12,",")
(22,000)
One option of course is to pre-process the text, but since we can
support custom parsers I thought I'd take a look at the code to teach it
some flexibility on numbers. I'm guessing this would be of interest to
anyone wanting to support European-style "," decimal indicators too.
My "C" is horribly rusty, so can I check I've got this right? Before I
start exploring compiler errors I've not seen for decades ;-)
The parser functions (prsd_xxx) are all defined in
backend/tsearch/wparser_def.c
The state machine is driven through the TParserStateActionItem
definitions on lines 644 - 1263. Changing one of these will change the
definition of the corresponding token-type.
To add a new token-type, I'd add it to the various lists line 30-194,
then add the relevant TParserStateActionItems.
Thanks
--
Richard Huxton
Archonet Ltd
From | Date | Subject | |
---|---|---|---|
Next Message | Magnus Hagander | 2008-02-27 11:47:15 | Re: OSSP can be used in the windows environment now! |
Previous Message | Simon Riggs | 2008-02-27 10:47:29 | Re: An idea for parallelizing COPY within one backend |