Re: [PROPOSAL] Shared Ispell dictionaries

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PROPOSAL] Shared Ispell dictionaries
Date: 2018-01-13 21:33:14
Message-ID: 404f06c5-c7a3-3313-1a64-25f08ba089ad@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/13/2018 04:22 PM, Arthur Zakirov wrote:
> Hello,
>
> Thank you Tomas for your review.
>
> On Sat, Jan 13, 2018 at 03:25:55AM +0100, Tomas Vondra wrote:
>> allocate memory in the buildCxt. What about adding tmpstrdup to copy a
>> string into the context? I admit this is mostly nitpicking though.
>
> ... snip ...>
>> 8) One more thing - I've noticed that the hash table uses this key:
>> ...
>> That is, full paths to the two files, and I'm not sure that's a very
>> good idea. Firstly, it's a bit wasteful (1kB per path). But more
>> importantly it means all dictionaries referencing the same files will
>> share the same chunk of shared memory - not only within a single
>> database, but across the whole cluster. That may lead to surprising
>> behavior, because e.g. unloading a dictionary in one database will
>> affect dictionaries in all other databases referencing the same files.
>
> Hm, indeed. It's worth to use only file names instead full paths. And
> it is good idea to use more information besides file names. It can be
> Oid of a database and Oid of a namespace maybe, because a
> dictionary can be created in different schemas.
>

I doubt using filenames (without the directory paths) solves anything,
really. The keys still have to be MAXPGPATH because someone could create
very long filename. But I don't think memory consumption is such a big
deal, really. With 1000 dictionaries it's still just ~2MB of data, which
is negligible compared to the amount of memory saved by sharing the
dictionaries.

Not sure if we really need to add the database/schema OIDs. I mentioned
the unexpected consequences (cross-db sharing) but maybe that's a
feature we should keep (it reduces memory usage). So perhaps this should
be another CREATE TEXT SEARCH DICTIONARY parameter, allowing sharing the
dictionary with other databases?

Aren't we overengineering this?

> I think your proposals may be implemented in several patches, so they can
> be applyed independently but consistently. I suppose I will prepare new
> version of the patch with fixes and with initial design of new functions
> and commands soon.
>

Yes, splitting patches into smaller, more focused bits is a good idea.

BTW the current patch fails to document the dictionary sharing. It only
mentions it when describing the shared_dictionaries GUC. IMHO the right
place for additional details is

https://www.postgresql.org/docs/10/static/textsearch-dictionaries.html

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2018-01-13 21:52:26 Re: proposal: alternative psql commands quit and exit
Previous Message Michael Meskes 2018-01-13 20:17:45 Re: [PATCH] ECPG bug fix in preproc when indicator struct is shorter than record struct