[PROPOSAL] Shared Ispell dictionaries

From: Arthur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
To: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: [PROPOSAL] Shared Ispell dictionaries
Date: 2017-12-26 16:48:27
Message-ID: 20171226164825.GA29922@zakirov.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Hello, hackers!


I'm going to implement a patch which will store Ispell dictionaries in a shared memory.

There is an extension shared_ispell [1], developed by Tomas Vondra. But it is a bad candidate for including into contrib.
Because it should know a lot of information about IspellDict struct to copy it into a shared memory.


Shared Ispell dictionary gives the following improvements:
- consume less memory - Ispell dictionary loads into memory for every backends and requires for some dictionaries more than 100Mb
- there is no overhead during first call of a full text search function (such as to_tsvector(), to_tsquery())


It is necessary to change all structures related with IspellDict: SPNode, AffixNode, AFFIX, CMPDAffix, IspellDict itself. They all shouldn't use pointers for this reason. Others are used only during dictionary building.
It would be good to store in a shared memory StopList struct too.

All fields of IspellDict struct, which are used only during dictionary building, will be move into new IspellDictBuild to decrease needed shared memory size. And they are going to be released by buildCxt.

Each dictionary will be stored in its own dsm segment. Structures for regular expressions won't be stored in a shared memory. They are compiled for every backend.

The patch will be ready and added into the 2018-03 commitfest.

Thank you for your attention. Any thoughts?

1 - github.com/tvondra/shared_ispell or github.com/postgrespro/shared_ispell

Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2017-12-26 16:55:57 Re: [PROPOSAL] Shared Ispell dictionaries
Previous Message Alvaro Herrera 2017-12-26 16:31:03 Re: Deadlock in multiple CIC.