Re: BUG #15689: Stemming of negation/not operator

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: ivanviragine(at)gmail(dot)com
Cc: pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #15689: Stemming of negation/not operator
Date: 2019-03-12 22:34:02
Message-ID: 16223.1552430042@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

PG Bug reporting form <noreply(at)postgresql(dot)org> writes:
> When using to_tsquery function it is stemming negation/not parts of the
> query, where it probably shouldn't.
> Some examples:

> SELECT to_tsquery('english', 'car & !cars');
> to_tsquery
> ----------------
> 'car' & !'car'

I'm not exactly convinced by this argument, because it seems like
you're only thinking about a corner case. There are probably at
least as many examples where you *do* want stemming on a negated term.

Another issue is that even if we changed the tsquery input function
to not stem particular words, I doubt that it would do anything useful,
because what it will be comparing to is tsvector entries that have
certainly been stemmed. That is, even if the original document said
"cars", what's going to be in the tsvector is just "car", so that
forbidding a match to "cars" isn't going to do anything. (Maybe
what this says is that there should be a less-lossy recheck against
the original document after the tsvector match, but that'd have to
be done by an additional, explicit operator I think. Or possibly
the recheck just requires tsquery match with a different stemming
configuration.)

A related problem that's bothered me for some time is that lexemes
get stemmed even if there is a "*" (prefix match) marker on them,
causing them to possibly match much more than the user expected.
But again, it's not real obvious how to make that better given the
match-to-tsvector context --- not stemming could easily remove
desired matches to stemmed tsvector entries.

If we could think of a way for it to do something useful, my inclination
would be to allow an explicit "don't stem" marker on lexemes, rather
than trying to drive it off whether the context is a negation or not.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Sandeep Thakkar 2019-03-13 03:55:19 Re: Installation issue
Previous Message PG Bug reporting form 2019-03-12 20:00:43 BUG #15689: Stemming of negation/not operator