BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent

From: "Yishai Lerner" <yish(at)alum(dot)mit(dot)edu>
To: pgsql-bugs(at)postgresql(dot)org
Subject: BUG #4306: TSearch2 stemming, stop words and lexize behaviour inconsistent
Date: 2008-07-14 21:04:41
Message-ID: 200807142104.m6EL4fcq051121@wwwmaster.postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs


The following bug has been logged online:

Bug reference: 4306
Logged by: Yishai Lerner
Email address: yish(at)alum(dot)mit(dot)edu
PostgreSQL version: 8.3.1
Operating system: RHEL5 and MacOSX 10.4
Description: TSearch2 stemming, stop words and lexize behaviour
inconsistent
Details:

I would expect the behavior for to_tsquery for the three variations of
"what", "what's" and "whats" to be consistent and for all variations to be
ignored since they all result in a stop word of "what". However, this is
not the case as to_tsquery("whats") returns the stop word "what" as a
result. Even more confusing is that if one were to look at the lexize
results below, they are inconsistent with the to_tsquery results below.
This seems like a bug to me.

goodrec_2=# select lexize('en_stem', 'what''s');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'whats');
lexize
--------
{what}

goodrec_2=# select lexize('en_stem', 'what');
lexize
--------
{}

goodrec_2=# select to_tsquery('what''s');
NOTICE: query contains only stopword(s) or doesn't contain lexeme(s),
ignored
to_tsquery

goodrec_2=# select to_tsquery('whats');
to_tsquery
------------
'what'

goodrec_2=# select to_tsquery('what');
NOTICE: query contains only stopword(s) or doesn't contain lexeme(s),
ignored

Browse pgsql-bugs by date

  From Date Subject
Next Message Thibauld Favre 2008-07-14 22:22:47 Re: BUG #4286: ORDER BY returns inconsistent results when using LIMIT on a integer column set to default values
Previous Message Tom Lane 2008-07-14 14:30:28 Re: BUG #4296: Server crashes by restoring database