Re: patch: preload dictionary new version

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Takahiro Itagaki <itagaki(dot)takahiro(at)oss(dot)ntt(dot)co(dot)jp>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: patch: preload dictionary new version
Date: 2010-07-09 06:44:35
Message-ID: AANLkTimgw4N_rNFpJANboidG9O5oCdkzGqdKHO0O2jCG@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2010/7/8 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> writes:
>> 2010/7/8 Robert Haas <robertmhaas(at)gmail(dot)com>:
>>> A precompiler can give you all the same memory management benefits.
>
>> I use mmap(). And with  mmap the precompiler are not necessary.
>> Dictionary is loaded only one time - in original ispell format. I
>> think, it is much more simple for administration - just copy ispell
>> files. There are not some possible problems with binary
>> incompatibility, you don't need to solve serialisation,
>> deserialiasation, ...you don't need to copy TSearch ispell parser code
>> to client application - probably we would to support not compiled
>> ispell dictionaries still. Using a precompiler means a new questions
>> for upgrade!
>
> You're inventing a bunch of straw men to attack.  There's no reason that
> a precompiler approach would have to put any new requirements on the
> user.  For example, the dictionary-load code could automatically execute
> the precompile step if it observed that the precompiled copy of the
> dictionary was missing or had an older file timestamp than the source.

uff - just safe activation of precompiler needs lot of low level code
- but maybe I see it wrong, and I doesn't work directly with files
inside pg. But I can't to see it as simple solution.

>
> I like the idea of a precompiler step mainly because it still gives you
> most of the benefits of the patch on platforms without mmap.  (Instead
> of mmap'ing, just open and read() the precompiled file.)  In particular,
> you would still have a creditable improvement for Windows users without
> writing any Windows-specific code.
>

the loading cca 10 MB takes on my comp cca 30 ms - it is better than
90ms, but it isn't a win.

>> I think we can divide this problem to three parts
>
>> a) simple allocator - it can be used not only for TSearch dictionaries.
>
> I think that's a waste of time, frankly.  There aren't enough potential
> use cases.
>
>> b) sharing a data - it is important for large dictionaries
>
> Useful but not really essential.
>
>> c) preloading - it decrease load time of first TSearch query
>
> This is the part that is the make-or-break benefit of the patch.
> You need a solution that cuts load time even when mmap isn't
> available.
>

I am not sure if this existing, and if it is necessary. Probably main
problem is with Czech language - we have a few specialities. For Czech
environment is UNIX and Windows platform the most important. I have
not information about using Postgres and Fulltext on other platforms
here. So, probably the solution doesn't need be core. I am thinking
about some pgfoundry project now - some like ispell dictionary
preload.

I can send only simplified version without preloading and sharing.
Just solving a memory issue - I think so there are not different
opinions.

best regards

Pavel Stehule

>                        regards, tom lane
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2010-07-09 07:40:48 Re: patch (for 9.1) string functions
Previous Message KaiGai Kohei 2010-07-09 04:56:29 Re: Bug? Concurrent COMMENT ON and DROP object