Re: [PROPOSAL] Improvements of Hunspell dictionaries support

From: Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PROPOSAL] Improvements of Hunspell dictionaries support
Date: 2015-11-10 10:23:47
Message-ID: 5641C5B3.7000301@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

08.11.2015 14:23, Artur Zakirov пишет:
> Thank you for reply.
>
> This was because of the flag field size of the SPELL struct. And long
> flags were being trancated in the .dict file.
>
> I attached new patch. It is temporary patch, not final. It can be done
> better.
>

I have updated the patch and attached it. Now dynamic memory allocation
is used to the flag field of the SPELL struct.

I have valued time of a dictionary loading and memory using by a
dictionary in the new patch. Dictionary is loaded at the first reference
to it. For example, if we execute ts_lexize function. And first
ts_lexize executing takes more time than second.

The following table shows performance of some dictionaries before patch
and after in my computer.

-------------------------------------------------
| | loading time, ms | memory, MB |
| | before | after | before | after |
-------------------------------------------------
|ar | 700 | 300 | 23,7 | 15,7 |
|br_fr | 410 | 450 | 27,4 | 27,5 |
|ca | 248 | 245 | 14,7 | 15,4 |
|en_us | 100 | 100 | 5,4 | 6,2 |
|fr | 160 | 178 | 13,7 | 14,1 |
|gl_es | 160 | 150 | 9 | 9,4 |
|is | 260 | 202 | 16,1 | 16,3 |
-------------------------------------------------

As you can see, substantially loading time and memory using before and
after the patch are same.

Link to patch in commitfest:
https://commitfest.postgresql.org/8/420/

Link to regression tests:
https://dl.dropboxusercontent.com/u/15423817/HunspellDictTest.tar.gz

--
Artur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company

Attachment Content-Type Size
hunspell_dict.patch text/x-patch 20.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2015-11-10 12:24:59 Per-table log_autovacuum_min_duration is actually documented
Previous Message Kouhei Kaigai 2015-11-10 08:10:32 Re: bootstrap pg_shseclabel in relcache initialization