Re: text search: restricting the number of parsed words in headline generation

From: Alvaro Herrera <alvherre(at)commandprompt(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: sushant354 <sushant354(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: text search: restricting the number of parsed words in headline generation
Date: 2011-08-23 19:12:50
Message-ID: 1314126653-sup-3641@alvh.no-ip.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Excerpts from Tom Lane's message of mar ago 23 15:59:18 -0300 2011:
> Sushant Sinha <sushant354(at)gmail(dot)com> writes:
> > Given a document and a query, the goal of headline generation is to
> > produce text excerpts in which the query appears.
>
> ... right ...
>
> > Here is a simple patch that limits the number of words during the
> > tokenization phase and puts an upper-bound on the headline generation.
>
> Doesn't this force the headline to be taken from the first N words of
> the document, independent of where the match was? That seems rather
> unworkable, or at least unhelpful.

Yeah ...

Doesn't a search result include the position on which the tokens were
found within the document? Wouldn't it make more sense to improve the
system somehow so that it can restrict searching for headlines in the
general area where the tokens were found?

--
Álvaro Herrera <alvherre(at)commandprompt(dot)com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-08-23 19:19:51 Re: Getting rid of pg_pltemplate
Previous Message Dimitri Fontaine 2011-08-23 19:09:34 Re: Getting rid of pg_pltemplate