Re: easy way of copying regex_t

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: easy way of copying regex_t
Date: 2016-01-25 10:07:17
Message-ID: 56A5F3D5.9030702@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 01/24/2016 10:41 PM, Tom Lane wrote:
> Artur Zakirov <a(dot)zakirov(at)postgrespro(dot)ru> writes:
>> With this message I want to send some patch to your repository with
>> draft of a code, which allows shared_ispell to copy regex_t.
>
> Allowing ispell.c to know that much about regex internals strikes me
> as completely unacceptable from a modularity or maintainability
> standpoint. If we want to do that, the appropriate thing would be to
> add a function to backend/regex/ that copies a regex_t.

I share the feeling - that's too much insight into internal regex data
structures, so should be part of backend/regex and not in the extension.

> However, I'm rather suspicious of the safety of copying a regex_t into
> shared memory in the first place. It contains function pointers, which
> we have not historically assumed would be portable between different
> backend processes. And the regex library is old enough to have never
> heard of thread safety, so I'm not really sure that it considers the
> regex_t structures to be read-only at execution time.

Right, it's definitely not thread-safe so there'd need to be some lock
protecting the regex_t copy. I was thinking about either using a group
of locks, each protecting a small subset of the affixes (thus making it
possible to work in parallel to some extent), or simply using a single
lock and each process would make a private copy at the beginning.

In the end, I've decided to do it differently, and simply parse the
affix list from scratch in each process. The affix list is tiny and
takes less than a millisecond to parse in most cases, and I don't have
to care about the regex stuff at all. The main benefit is from sharing
parsed wordlist anyway.

>
>> shared_ispell loads dictionaries into a shared memory. The main benefits
>> are:
>> - saving of memory. Every dictionary is loaded only once. Dictionaries
>> are not loaded for each backend. In current version of PostgreSQL
>> dictionaires are loaded for each backend where it was requested.
>> - saving of time. The first load of a dictionary takes much time. With
>> this patch dictionaries will be loaded only once.
>
> Where does the shared memory space come from? It would not take too
> many dictionaries to use up whatever slop is available.

It's an old-school shared segment created by the extension at init time.
You're right the size is fixed so it's possible to run out of space by
loading too many dictionaries, but that was not a big deal for the type
of setups it was designed for - in those cases the list of dictionaries
is stable, so it's possible to size the segment accordingly in advance.

But I guess we could do better now that we have dynamic shared memory,
possibly allocating one segment per dictionary as needed, or something
like that.

regards

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2016-01-25 10:32:53 Re: Declarative partitioning
Previous Message Tomas Vondra 2016-01-25 09:55:48 Re: Add generate_series(date,date) and generate_series(date,date,integer)