[PATCH] Add Hebrew and Arabic combining characters to unaccent.rules

From: e3718e7(at)tutamail(dot)com
To: Pgsql Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: [PATCH] Add Hebrew and Arabic combining characters to unaccent.rules
Date: 2025-08-22 20:10:14
Message-ID: OYIOnWD--F-9@tutamail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

This adds combining diacritical mark ranges in Hebrew and Arabic unicode blocks (things like cantillations, vowel marks, etc.) to the list of code points which should be stripped in `unaccent`. There are a few punctuation code points interspersed between the ranges, so more contiguous blocks cannot be used.

Attachment Content-Type Size
0001-Add-Hebrew-and-Arabic-combining-characters-to-unacce.patch application/octet-stream 2.9 KB

Browse pgsql-hackers by date

  From Date Subject
Next Message Euler Taveira 2025-08-22 20:32:34 Re: Adding REPACK [concurrently]
Previous Message Sami Imseih 2025-08-22 20:01:53 Re: Improve LWLock tranche name visibility across backends