Re: english parser in text search: support for multiple words in the same position

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>
Cc: sushant354(at)gmail(dot)com, Markus Wanner <markus(at)bluegap(dot)ch>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: english parser in text search: support for multiple words in the same position
Date: 2010-08-02 14:26:57
Message-ID: AANLkTi=C9upTVxEm8M_xzz5djmub+h+9bW-16HJRDc70@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Aug 2, 2010 at 10:21 AM, Kevin Grittner
<Kevin(dot)Grittner(at)wicourts(dot)gov> wrote:
> Sushant Sinha <sushant354(at)gmail(dot)com> wrote:
>
>> Yes thats what I am planning to do. I just wanted to see if anyone
>> can help me in estimating whether this is doable in the current
>> parser or I need to write a new one. If possible, then some idea
>> on how to go about implementing?
>
> The current tsearch parser is a state machine which does clunky mode
> switches to handle special cases like you describe.  If you're
> looking at doing very much in there, you might want to consider a
> rewrite to something based on regular expressions.  See discussion
> in these threads:
>
> http://archives.postgresql.org/message-id/200912102005.16560.andres@anarazel.de
>
> http://archives.postgresql.org/message-id/4B210D9E020000250002D344@gw.wicourts.gov
>
> That was actually at the top of my personal PostgreSQL TODO list
> (after my current project is wrapped up), but I wouldn't complain if
> someone else wanted to take it.  :-)

If you end up rewriting it, it may be a good idea, in the initial
rewrite, to mimic the current results as closely as possible - and
then submit a separate patch to change the results. Changing two
things at the same time exponentially increases the chance of your
patch getting rejected.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Yeb Havinga 2010-08-02 14:47:29 Re: patch for check constraints using multiple inheritance
Previous Message Kevin Grittner 2010-08-02 14:21:55 Re: english parser in text search: support for multiple words in the same position