Skip site navigation (1) Skip section navigation (2)

WIP: shared ispell dictionary

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: WIP: shared ispell dictionary
Date: 2010-03-18 10:33:46
Message-ID: 162867791003180333s1933e5b7g9208dd9a2bb681c6@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
Hello

attached patch add possibility to share ispell dictionary between
processes. The reason for this is the slowness of first tsearch query
and size of allocated memory per process. When I tested loading of
ispell dictionary (for Czech language) I got about 500 ms and 48MB.
With simple allocator it uses only 25 MB. If we remove some check and
tolower string transformation from loading stage it needs only 200 ms.
But with broken dict or affix file it can put wrong results. This
patch significantly reduce load on servers that use ispell
dictionaries.

I know so Tom worries about using of share memory. I think so it
unnecessarily. After loading data from dictionary are only read, never
modified. Second idea - this dictionary template can be distributed as
separate project (it needs a few changes in core - and simple
allocator).

Using:

a) set shared_data = 26MB (postgres.conf)
b) restart
c) register dictionary with option "share=yes"

CREATE TEXT SEARCH DICTIONARY cspell
   (template=ispell, dictfile = czech, afffile=czech, stopwords=czech,
share = yes);


[pavel(at)nemesis src]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type "help" for help.

postgres=# select * from ts_debug('cs','Příliš žluťoučký kůň se napil
žluté vody');
   alias   |    description    |   token   |  dictionaries   |
dictionary |   lexemes
-----------+-------------------+-----------+-----------------+------------+-------------
 word      | Word, all letters | Příliš    | {cspell,simple} | cspell
   | {příliš}
 blank     | Space symbols     |           | {}              |            |
 word      | Word, all letters | žluťoučký | {cspell,simple} | cspell
   | {žluťoučký}
 blank     | Space symbols     |           | {}              |            |
 word      | Word, all letters | kůň       | {cspell,simple} | cspell
   | {kůň}
 blank     | Space symbols     |           | {}              |            |
 asciiword | Word, all ASCII   | se        | {cspell,simple} | cspell     | {}
 blank     | Space symbols     |           | {}              |            |
 asciiword | Word, all ASCII   | napil     | {cspell,simple} | cspell
   | {napít}
 blank     | Space symbols     |           | {}              |            |
 word      | Word, all letters | žluté     | {cspell,simple} | cspell
   | {žlutý}
 blank     | Space symbols     |           | {}              |            |
 asciiword | Word, all ASCII   | vody      | {cspell,simple} | cspell
   | {voda}
(13 rows)

Time: 8,178 ms  <<-- without patch 500ms

Limits and ToDo:
a) it support only simple regular expressions
b) it doesn't solve cache reset a shared memory deallocation

Regards
Pavel Stehule

Attachment: shared_dictionary_02.diff
Description: application/octet-stream (40.9 KB)

Responses

pgsql-hackers by date

Next:From: Gokulakannan SomasundaramDate: 2010-03-18 11:06:11
Subject: Re: An idle thought
Previous:From: Simon RiggsDate: 2010-03-18 09:43:24
Subject: Re: Command to prune archive at restartpoints

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group