Re: PATCH: Allow empty targets in unaccent dictionary

From: David Fetter <david(at)fetter(dot)org>
To: Mohammad Alhashash <alhashash(at)alhashash(dot)net>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: PATCH: Allow empty targets in unaccent dictionary
Date: 2014-04-21 04:21:04
Message-ID: 20140421042104.GI24095@fetter.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Please add this to the next commitfest.

https://commitfest.postgresql.org/action/commitfest_view?id=22

Cheers,
David.
On Sun, Apr 20, 2014 at 01:06:43AM +0200, Mohammad Alhashash wrote:
> Hi,
>
> Currently, unaccent extension only allows replacing one source
> character with one or more target characters. In Arabic, Hebrew and
> possibly other languages, diacritics are standalone characters that
> are being added to normal letters. To use unaccent dictionary for
> these languages, we need to allow empty targets to remove diacritics
> instead of replacing them.
>
> The attached patch modfies unaacent.c so that dictionary parser uses
> zero-length target when the line has no target.
>
> Best Regards,
>
> Mohammad Alhashash
>

> diff --git a/contrib/unaccent/unaccent.c b/contrib/unaccent/unaccent.c
> old mode 100644
> new mode 100755
> index a337df6..4e72829
> --- a/contrib/unaccent/unaccent.c
> +++ b/contrib/unaccent/unaccent.c
> @@ -58,7 +58,9 @@ placeChar(TrieChar *node, unsigned char *str, int lenstr, char *replaceTo, int r
> {
> curnode->replacelen = replacelen;
> curnode->replaceTo = palloc(replacelen);
> - memcpy(curnode->replaceTo, replaceTo, replacelen);
> + /* palloc(0) returns a valid address, not NULL */
> + if (replaceTo) /* memcpy() is undefined for NULL pointers*/
> + memcpy(curnode->replaceTo, replaceTo, replacelen);
> }
> }
> else
> @@ -105,10 +107,10 @@ initTrie(char *filename)
> while ((line = tsearch_readline(&trst)) != NULL)
> {
> /*
> - * The format of each line must be "src trg" where src and trg
> + * The format of each line must be "src [trg]" where src and trg
> * are sequences of one or more non-whitespace characters,
> * separated by whitespace. Whitespace at start or end of
> - * line is ignored.
> + * line is ignored. If no trg added, a zero-length string is used.
> */
> int state;
> char *ptr;
> @@ -160,6 +162,13 @@ initTrie(char *filename)
> }
> }
>
> + /* if no trg (loop stops at state 1 or 2), use zero-length target */
> + if (state == 1 || state == 2)
> + {
> + trglen = 0;
> + state = 5;
> + }
> +
> if (state >= 3)
> rootTrie = placeChar(rootTrie,
> (unsigned char *) src, srclen,

>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter
Skype: davidfetter XMPP: david(dot)fetter(at)gmail(dot)com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Воронин Дмитрий 2014-04-21 04:48:37 New functions in sslinfo module
Previous Message Michael Paquier 2014-04-21 03:54:48 Removing dependency to wsock32.lib when compiling code on WIndows