Re: [PROPOSAL] Shared Ispell dictionaries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ildus Kurbangaliev <i(dot)kurbangaliev(at)postgrespro(dot)ru>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Date: 2018-03-19 00:52:41
Message-ID: d4d52484-e269-9d7b-cef5-c1e95a03d9c6@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 03/17/2018 05:43 AM, Arthur Zakirov wrote:
> Hello Tomas,
>
> Arthur, what are your plans with this patch in the current CF?
>
>
> I think dsm-based approach is in good shape already and works nice.
> I've planned only to improve the documentation a little. Also it seems I
> should change 0004 part, I found that extension upgrade scripts may be
> made in wrong way.
> In my opinion RELOAD and UNLOAD commands can be made in next commitfest
> (2018-09).
> Did you look it? Have you arguments about how shared memory allocation
> and releasing functions are made?
>  
>
>
> It does not seem to be moving towards RFC very much, and reworking the
> patch to use mmap() seems like a quite significant change late in the
> CF. Which means it's likely to cause the patch get get bumped to the
> next CF (2018-09).
>
>
> Agree. I have a draft version for mmap-based approach which works in
> platforms with mmap. In Windows it is necessary to use another API
> (CreateFileMapping, etc). But this approach requires more work on
> handling processed dictionary files (how name them, when remove).
>  
>
>
> FWIW I am not quite sure if the mmap() approach is better than what was
> implemented by the patch. I'm not sure how exactly will it behave under
> memory pressure (AFAIK it goes through page cache, which means random
> parts of dictionaries might get evicted) or how well is it supported on
> various platforms (say, Windows).
>
>
> Yes, as I wrote mmap-based approach requires more work. The only
> benefit I see is that you don't need to process a dictionary after
> server restart. I'd vote for dsm-based approach.
>

I do agree with that. We have a working well-understood dsm-based
solution, addressing the goals initially explained in this thread.

I don't see a reason to stall this patch based on a mere assumption that
the mmap-based approach might be magically better in some unknown
aspects. It might be, but we may as well leave that as a future work.

I wonder how much of this patch would be affected by the switch from dsm
to mmap? I guess the memory limit would get mostly irrelevant (mmap
would rely on the OS to page the memory in/out depending on memory
pressure), and so would the UNLOAD/RELOAD commands (because each backend
would do it's own mmap).

In any case, I suggest to polish the dsm-based patch, and see if we can
get that one into PG11.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2018-03-19 01:08:31 Re: HELP
Previous Message Tom Lane 2018-03-19 00:28:18 Re: ECPG installcheck tests fail if PGDATABASE is set