Re: BUG #18057: unaccent removes intentional spaces

From: Michael Paquier <michael(at)paquier(dot)xyz>
To: martin(at)schlossarek(dot)me, pgsql-bugs(at)lists(dot)postgresql(dot)org
Subject: Re: BUG #18057: unaccent removes intentional spaces
Date: 2023-08-18 06:18:58
Message-ID: ZN8NUpx2f9pB+F/g@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Wed, Aug 16, 2023 at 09:00:43AM +0900, Michael Paquier wrote:
> Agreed that this looks incorrect as-is. This goes as far as 9a206d0
> when these has been introduced, and it looks like the culprit is
> around initTrie() where the entries are loaded. See around t_isspace,
> for example.

I was looking at the code, and my first impression was right. All
leading and trailing whitespaces between the two characters listed in
the rule file are discarded. The thing is that we clearly document
the parsing rules for the sake of any custom files one can feed to the
extension:
https://www.postgresql.org/docs/devel/unaccent.html

I am not sure what we can do here. Doing nothing is certainly an
option, but I am wondering if we could put in place an extra rule
where whitespaces can be part of the translated character if it uses
double quotes, for example. Thoughts?
--
Michael

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message torikoshia 2023-08-18 06:40:57 Re: pg_rewind WAL segments deletion pitfall
Previous Message Emile Amewoto 2023-08-18 05:36:23 Postgresql15 crash with :FATAL: could not open shared memory segment "/PostgreSQL.0000000": No such file or directory