Re: [PROPOSAL] Shared Ispell dictionaries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Date: 2019-01-22 19:17:56
Message-ID: 1932084f-167a-8893-be24-c4c06afe113b@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 1/22/19 7:36 PM, Arthur Zakirov wrote:
> пн, 21 янв. 2019 г. в 19:42, Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>:
>>
>> On 21.01.2019 17:56, Tomas Vondra wrote:
>>> I wonder if we could devise some simple cache eviction policy. We don't
>>> have any memory limit GUC anymore, but maybe we could use unload
>>> dictionaries that were unused for sufficient amount of time (a couple of
>>> minutes or so). Of course, the question is when exactly would it happen
>>> (it seems far too expensive to invoke on each dict access, and it should
>>> happen even when the dicts are not accessed at all).
>>
>> Yes, I thought about such feature too. Agree, it could be expensive
>> since we need to scan pg_ts_dict table to get list of dictionaries (we
>> can't scan dshash_table).
>>
>> I haven't a good solution yet. I just had a thought to return
>> max_shared_dictionaries_size. Then we can unload dictionaries (and scan
>> the pg_ts_dict table) that were accessed a lot time ago if we reached
>> the size limit.
>> We can't set exact size limit since we can't release the memory
>> immediately. So max_shared_dictionaries_size can be renamed to
>> shared_dictionaries_threshold. If it is equal to "0" then PostgreSQL has
>> unlimited space for dictionaries.
>
> I want to propose to clean up segments during vacuum/autovacuum. I'm not
> aware of the politics of cleaning up objects besides relations during
> vacuum/autovacuum. Could be it a good idea?
>

I doubt that's a good idea, for a couple of reasons. For example, would
it be bound to autovacuum on a particular object or would it happen as
part of each vacuum run? If the dict cleanup happens only when vacuuming
a particular object, then which one? If it happens on each autovacuum
run, then that may easily be far too frequent (it essentially makes the
cases with too frequent autovacuum runs even worse).

But also what happens when there only minimal write activity and thus no
regular autovacuum runs? Surely we should still do the dict cleanup.

> Vacuum might unload dictionaries when total size of loaded dictionaries
> exceeds a threshold. When it happens vacuum scans loaded dictionaries and
> unloads (unpins segments and removes hash table entries) those dictionaries
> which isn't mapped to any backend process (it happens because
> dsm_pin_segment() is called) anymore.
>

Then why to bound that to autovacuum at all? Why not just make it part
of loading the dictionary?

> max_shared_dictionaries_size can be renamed to
> shared_dictionaries_cleanup_threshold.
>

That really depends on what exactly the threshold does. If it only
triggers cleanup but does not enforce maximum amount of memory used by
dictionaries, then this name seems OK. If it ensures max amount of
memory, the max_..._size name would be better.

I think there are essentially two ways:

(a) Define max amount of memory available for shared dictionarires, and
come up with an eviction algorithm. This will be tricky, because when
the frequently-used dictionaries need a bit more memory than the limit,
this will result in trashing (evict+load over and over).

(b) Define what "unused" means for dictionaries, and unload dictionaries
that become unused. For example, we could track timestamp of the last
time each dict was used, and decide that dictionaries unused for 5 or
more minutes are unused. And evict those.

The advantage of (b) is that it adopts automatically, more or less. When
you have a bunch of frequently used dictionaries, the amount of shared
memory increases. If you stop using them, it decreases after a while.
And rarely used dicts won't force eviction of the frequently used ones.

cheers

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2019-01-22 19:32:46 Re: [HACKERS] proposal: schema variables
Previous Message Andres Freund 2019-01-22 19:09:33 Re: Allowing extensions to find out the OIDs of their member objects