Re: [PROPOSAL] Shared Ispell dictionaries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Date: 2018-03-19 18:40:54
Message-ID: 90c04bf7-fcab-8b0b-b461-43b46bf79970@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/19/2018 07:07 PM, Andres Freund wrote:
> On 2018-03-19 14:52:34 +0100, Tomas Vondra wrote:
>> On 03/19/2018 02:34 AM, Andres Freund wrote:
>>> Hi,
>>>
>>> On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote:
>>>> I do agree with that. We have a working well-understood dsm-based
>>>> solution, addressing the goals initially explained in this thread.
>>>
>>> Well, it's also awkward and manual to use. I do think that's
>>> something we've to pay attention to.
>>>
>>
>> Awkward in what sense?
>
> You've to manually configure a setting that can only be set at server
> start. You can't set it as big as necessary because it might use up
> memory better used for other things. It needs the full space for
> dictionaries even if the majority of it never will be needed. All of
> those aren't needed in an mmap world.
>

Which is not quite true, because that's not what the patch does.

Each dictionary is loaded into a separate dsm segment when needed, which
is then stored in a dhash table. So most of what you wrote is not really
true - the patch does not pre-allocate the space, and the setting might
be set even after server start (it's not defined like that currently,
but that should be trivial to change).

>
>> So, I'm not at all convinced the mmap approach is actually better
>> than the dsm one. And I believe that if we come up with a good way
>> to automate some of the tasks, I don't see why would that be
>> possible in the mmap and not dsm.
>
> To me it seems we'll end up needing a heck of a lot more code that
> the OS already implements if we do it ourselves.
>

Like what? Which features do you expect to need much more code?

The automated reloading will need a fairly small amount of code - the
main issue is deciding when to reload, and as I mentioned before that's
more complicated than you seem to believe. In fact, it may not even be
possible - there's no way to decide if all files are already updated.
Currently we kinda ignore that, on the assumption that dictionaries
change only rarely. We may do the same thing and reload the dict if at
least one file changes. In any case, the amount of code is trivial.

In fact, it may be more complicated in the mmap case - how do you update
a dictionary that is already mapped to multiple processes?

The eviction is harder - I'll give you that. But then again, I'm not
sure the mmap approach is really what we want here - it seems better to
evict the whole dictionary, than some random pages from many of them.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Kuzmenkov 2018-03-19 18:43:59 Re: IndexJoin memory problem using spgist and boxes
Previous Message Andres Freund 2018-03-19 18:17:23 Re: found xmin from before relfrozenxid on pg_catalog.pg_authid