Re: BUG #14278: Problem searching spanish words with accent mark outside the stem

From: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>
To: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>
Cc: paco(at)hernandezgomez(dot)com, pgsql-bugs(at)postgresql(dot)org
Subject: Re: BUG #14278: Problem searching spanish words with accent mark outside the stem
Date: 2016-08-08 05:27:04
Message-ID: CAJGNTeMvhD=0Pb0qK_A9PX-bqzhisK5gUSgD4dF23rF2DC_vsg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On 7 August 2016 at 23:58, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com> wrote:
> paco(at)hernandezgomez(dot)com wrote:
>
>> Search without accent mark is not working correctly when the accent mark is
>> outside the stem of the word.
>
> I think it'd be better to apply unaccent() to both the stored text
> before ts_vectorization and to the query terms. That would reliably
> remove all diacritics (eñes too, though I suppose nobody would search
> for their ñandúes by writing nandú, so it's not as severe).
>
>

problem is that unaccent() is stable so can't be in the index
expression, so OP would need to create a ts_vector field to store a
preprocessed version of the string (one in which ts_vector('spanish',
unaccent()) has been already executed. and query over that field.

<cough> or create an immutable version of unaccent() </cough>

--
Jaime Casanova www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Johan Fredriksson 2016-08-08 07:26:40 Re: [PERFORM] Create language plperlu Error
Previous Message Alvaro Herrera 2016-08-08 04:58:47 Re: BUG #14278: Problem searching spanish words with accent mark outside the stem