Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd

From: Alexander Hill <alex(at)hill(dot)net(dot)au>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #8354: stripped positions can generate nonzero rank in ts_rank_cd
Date: 2014-02-06 18:07:59
Message-ID: CA+KBOKxsaU7Q-Qc6-YV99AY1U_Rb0SbR78Bs5xM6=12PRMKJKA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Hi Bruce, all,

I think this can be solved (if it's agreed that it's a bug) in a pretty
straightforward way: when creating the document representation used in
calculating cover density rank, we can just skip lexemes with no position
entirely.

Fix and tests here: https://github.com/AlexHill/postgres/compare/bug_8354

As a patch file here:
https://github.com/AlexHill/postgres/commit/cd522b254d166d569b86803115f0f499864e949b.patch

Cheers,
Alex

On Sat, Feb 1, 2014 at 5:22 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:

>
> Would someone please comment on this text search bug report? Thanks.
>
> ---------------------------------------------------------------------------
>
> On Fri, Aug 2, 2013 at 07:03:42AM +0000, alex(at)hill(dot)net(dot)au wrote:
> > The following bug has been logged on the website:
> >
> > Bug reference: 8354
> > Logged by: Alex Hill
> > Email address: alex(at)hill(dot)net(dot)au
> > PostgreSQL version: 9.2.4
> > Operating system: OS X 10.8.4 Mountain Lion
> > Description:
> >
> > Hi all,
> >
> >
> > The docs for ts_rank_cd state:
> >
> >
> > "This function requires positional information in its input. Therefore it
> > will not work on "stripped" tsvector values -- it will always return
> zero."
> >
> >
> > However if a tsvector contains some stripped lexemes and some
> non-stripped,
> > ts_rank_cd will rank extents including the non-stripped values.
> >
> >
> > For example, this evaluates to zero as expected:
> >
> >
> > SELECT ts_rank_cd(strip(to_tsvector('text search')),
> > plainto_tsquery('text search'))
> >
> >
> >
> >
> > But this doesn't:
> >
> >
> > SELECT ts_rank_cd(to_tsvector('text') ||
> strip(to_tsvector('search')),
> > plainto_tsquery('text search'))
> >
> >
> >
> >
> > I think this is a bug, if not in the code then in the documentation,
> which
> > isn't clear on what happens when stripped and positioned lexemes are
> mixed
> > in one tsvector.
> >
> >
> > I would prefer that stripped lexemes were completely ignored by
> ts_rank_cd:
> > my use case is using this as a fifth pseudo-weight, which matches a @@
> query
> > but doesn't add to a ts_rank_cd ranking.
> >
> >
> > What do you think?
> >
> >
> > Cheers,
> > Alex
> >
> >
> >
> > --
> > Sent via pgsql-bugs mailing list (pgsql-bugs(at)postgresql(dot)org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-bugs
>
> --
> Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
> EnterpriseDB http://enterprisedb.com
>
> + Everyone has their own god. +
>

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message maxim.boguk 2014-02-07 04:55:18 BUG #9135: PostgreSQL doesn't want use index scan instead of (index scan+sort+limit)
Previous Message Heikki Linnakangas 2014-02-06 08:05:51 Re: BUG #9118: WAL Sender does not disconnect replication clients during shutdown