Re: patch: preload dictionary new version

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: preload dictionary new version
Date: 2010-07-08 12:20:51
Message-ID: AANLkTim02dngqQfnSeYASVDu7jhWEbWRAzjGpdj-ABZL@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/7/8 Robert Haas <robertmhaas(at)gmail(dot)com>:
> On Thu, Jul 8, 2010 at 7:03 AM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
>> 2010/7/8 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> On Wed, Jul 7, 2010 at 10:50 PM, Takahiro Itagaki
>>> <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp> wrote:
>>>> This patch allocates memory with non-file-based mmap() to preload text search
>>>> dictionary files at the server start. Note that dist files are not mmap'ed
>>>> directly in the patch; mmap() is used for reallocatable shared memory.
>>>
>>> I thought someone (Tom?) had proposed idea previously of writing a
>>> dictionary precompiler that would produce a file which could then be
>>> mmap()'d into the backend.  Has any thought been given to that
>>> approach?
>>
>> The precompiler can save only some time related to parsing. But it
>> isn't main issue. Without simple allocation the data from dictionary
>> takes about 55 MB, with simple allocation about 10 MB. If you have a
>> 100 max_session, then these data can be 100 x repeated in memory -
>> about 1G (for Czech dictionary).  I think so memory can be used
>> better.
>
> A precompiler can give you all the same memory management benefits.
>
>> Minimally you have to read these 10MB from disc - maybe from file
>> cache - but it takes some time too - but it will be significantly
>> better than now.
>
> If you use mmap(), you don't need to anything of the sort.  And the
> EXEC_BACKEND case doesn't require as many gymnastics, either.  And the
> variable can be PGC_SIGHUP or even PGC_USERSET instead of
> PGC_POSTMASTER.

I use mmap(). And with mmap the precompiler are not necessary.
Dictionary is loaded only one time - in original ispell format. I
think, it is much more simple for administration - just copy ispell
files. There are not some possible problems with binary
incompatibility, you don't need to solve serialisation,
deserialiasation, ...you don't need to copy TSearch ispell parser code
to client application - probably we would to support not compiled
ispell dictionaries still. Using a precompiler means a new questions
for upgrade!

The real problem is using a some API on MS Windows, where mmap doesn't exist.

I think we can divide this problem to three parts

a) simple allocator - it can be used not only for TSearch dictionaries.
b) sharing a data - it is important for large dictionaries
c) preloading - it decrease load time of first TSearch query

Regards

Pavel Stehule

>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise Postgres Company
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2010-07-08 13:08:41 leaky views, yet again
Previous Message Robert Haas 2010-07-08 11:53:28 Re: patch: preload dictionary new version