BUG #18362: unaccent rules and Old Greek text

From: PG Bug reporting form <noreply(at)postgresql(dot)org>
To: pgsql-bugs(at)lists(dot)postgresql(dot)org
Cc: cees(dot)van(dot)zeeland(at)freedom(dot)nl
Subject: BUG #18362: unaccent rules and Old Greek text
Date: 2024-02-24 21:33:05
Message-ID: 18362-be6d0cfe122b6354@postgresql.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

The following bug has been logged on the website:

Bug reference: 18362
Logged by: Cees van Zeeland
Email address: cees(dot)van(dot)zeeland(at)freedom(dot)nl
PostgreSQL version: 15.6
Operating system: Windows 11
Description:

I am using a Postgres Server 15.06-1 with UTF-8

I am struggling with the unaccent extension and "Old Greek" characters.
To explain what behaviour I encoutered, try this:

1. Create a table with one text field

CREATE TABLE IF NOT EXISTS public.test
(
entry text COLLATE pg_catalog."default" NOT NULL,
CONSTRAINT test_pkey PRIMARY KEY (entry)
)

2. Insert the next few greek words with (stress accents) on the vowels,
or import de CSV file with the same items.
ἀνήρ (== man)
πέντε (== five)
γίγας (== giant)
γράφω (== write)
δύο (== two)
ἐγώ (== Ι)
θεός (== god)

3. Create the next view for searching:

CREATE OR REPLACE VIEW public.test_view
AS
SELECT test.entry,
COALESCE(array_to_string(ts_lexize('unaccent'::regdictionary,
replace(test.entry, 'ς'::text, 'σ'::text)), ''::text), replace(test.entry,
'ς'::text, 'σ'::text)) AS search_entry
FROM test
ORDER BY test.entry;

4. Try if it works:

SELECT entry, search_entry FROM public.test_view;

Result shows that not all diacritics are removed

When I search in the unaccent.rules I see around line 530 characters that
look the same but they are in fact different. f.e.
Greek Small Letter Epsilon with Tonos
versus
Greek Small Letter Epsilon with Oxia

I found here a discussion about this subject:

https://ibiblio.org/bgreek/forum/viewtopic.php?t=4170

So, there are reasons to keep the current unaccent.rules as it is, but...
there are other reasons to add a few lines to it, f.e. after line 955 and
insert five greek vowels with Oxia
Please add:
ά α
έ ε
ή η
ί ι
ό ο
ύ υ
ώ ω

It would solve the problem and make searching through old greek texts al lot
easier...

Thanks for your help,

Cees van Zeeland

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Thomas Munro 2024-02-25 03:21:36 Re: BUG #18362: unaccent rules and Old Greek text
Previous Message PG Bug reporting form 2024-02-24 12:51:42 BUG #18361: systemd[1]: postgresql-16.service: Killing process 25992 (postgres) with signal SIGKILL.