Re: [PROPOSAL] Shared Ispell dictionaries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Date: 2018-03-19 13:52:34
Message-ID: 9c27384d-4538-15be-0e94-0a67a52e7b0b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/19/2018 02:34 AM, Andres Freund wrote:
> Hi,
>
> On 2018-03-19 01:52:41 +0100, Tomas Vondra wrote:
>> I do agree with that. We have a working well-understood dsm-based
>> solution, addressing the goals initially explained in this thread.
>
> Well, it's also awkward and manual to use. I do think that's
> something we've to pay attention to.
>

Awkward in what sense?

I don't think the manual aspect is an issue. Currently we have no way to
reload the dictionary, except for restarting all the backends. I don't
see that as a particularly convenient solution. Also, this is pretty
much how the shared_ispell extension works, although you might argue
that was more due to the limitation of how shared memory could be used
in extensions before DSM was introduced. In any case, I've never heard
complaints about this aspect of the extension.

There are about two things that might be automated - reloading of
dictionaries and evicting them when hitting the memory limit. I have
tried to implement that in the shared_ispell dictionary but it's a bit
more complicated than it looks.

For example, it seems obvious to reload the dictionary when the file
timestamp changes. But in fact there are three files - dict, affixes,
stopwords. So will you reload when a single file changes? All of them?
Keep in mind that the new version of dictionary may use different
affixes, so a reload at the wrong moment may result in broken result.

>
>> I wonder how much of this patch would be affected by the switch
>> from dsm to mmap? I guess the memory limit would get mostly
>> irrelevant (mmap would rely on the OS to page the memory in/out
>> depending on memory pressure), and so would the UNLOAD/RELOAD
>> commands (because each backend would do it's own mmap).
>
> Those seem fairly major.
>

I'm not sure I'd say those are major. And you might also see the lack of
these capabilities as negative points for the mmap approach.

So, I'm not at all convinced the mmap approach is actually better than
the dsm one. And I believe that if we come up with a good way to
automate some of the tasks, I don't see why would that be possible in
the mmap and not dsm.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-03-19 14:44:56 Re: Problems with Error Messages wrt Domains, Checks
Previous Message Alvaro Herrera 2018-03-19 13:38:18 Re: inserts into partitioned table may cause crash