Re: WIP: shared ispell dictionary

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Teodor Sigaev <teodor(at)sigaev(dot)ru>
Subject: Re: WIP: shared ispell dictionary
Date: 2010-03-18 15:08:39
Message-ID: 162867791003180808p49a047cfj72d1d89ce5121d9e@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/3/18 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> I know so Tom worries about using of share memory.
>
> You're right, and if I have any say in the matter no patch like this
> will ever go in.
>
> What I would suggest looking into is some way of preprocessing the raw
> text dictionary file into a format that can be slurped into memory
> quickly.  The main problem compared to the way things are done now
> is that the current internal format relies heavily on pointers.
> Maybe you could replace those by offsets?

You have to maintain a new application :( There can be a new kind of bugs.

I playing with preload solution now. And I found a new issue.

I don't know why, but when I preload library with large mem like
ispell, then all next operations are ten times slower :(

[pavel(at)nemesis tsearch]$ psql-dev3 postgres
Timing is on.
psql-dev3 (9.0devel)
Type "help" for help.

postgres=# select 10;
?column?
----------
10
(1 row)

Time: 0,611 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 0,277 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 0,266 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 0,348 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)

Time: 24,495 ms
postgres=# select * from ts_debug('cs','Jmenuji se Pavel Stěhule a
bydlím ve Skalici');
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)

...skipping...
alias | description | token | dictionaries |
dictionary | lexemes
-----------+-------------------+---------+---------------------------+------------------+----------------
asciiword | Word, all ASCII | Jmenuji | {preloaded_cspell,simple} |
preloaded_cspell | {jmenovat}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | se | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Pavel | {preloaded_cspell,simple} |
preloaded_cspell | {pavel,pavla}
blank | Space symbols | | {} |
|
word | Word, all letters | Stěhule | {preloaded_cspell,simple} |
preloaded_cspell | {stěhule}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | a | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
word | Word, all letters | bydlím | {preloaded_cspell,simple} |
preloaded_cspell | {bydlet,bydlit}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | ve | {preloaded_cspell,simple} |
preloaded_cspell | {}
blank | Space symbols | | {} |
|
asciiword | Word, all ASCII | Skalici | {preloaded_cspell,simple} |
preloaded_cspell | {skalice}
(15 rows)

~
~
~
Time: 18,426 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 12,700 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 12,465 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 12,603 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 12,901 ms
postgres=# select 10;
?column?
----------
10
(1 row)

Time: 12,642 ms

When I reduce memory with simple allocator, then this issue is
removed, but it is strange.

Pavel

>
>                        regards, tom lane
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-03-18 15:15:04 Re: WIP: shared ispell dictionary
Previous Message Tom Lane 2010-03-18 14:40:32 Re: WIP: shared ispell dictionary