Creating a DSA area to provide work space for parallel execution

From: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
To: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Creating a DSA area to provide work space for parallel execution
Date: 2016-10-04 21:32:21
Message-ID: CAEepm=0HmRefi1+xDJ99Gj5APHr8Qr05KZtAxrMj8b+ay3o6sA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi hackers,

A couple of months ago I proposed dynamic shared areas[1]. DSA areas
are dynamically sized shared memory heaps that backends can use to
share data, building on top of the existing DSM infrastructure.

One target use case for DSA areas is to provide work space for
parallel query execution. To that end, here is a patch to create a
DSA area for use by executor code. The area is automatically attached
to the leader and all worker processes for the duration of the
parallel query, and is available as estate->es_query_area.

Backends already have access to shared memory through a single DSM
segment managed with a table-of-contents. The TOC provides a way to
carve out some shared storage space for individual executor nodes and
look it up later by plan node ID. That works for things like
ParallelHeapScanDescData whose size is known up front, but not so well
if you need something more like a heap in which to build shared data
structures. Through estate->es_query_area, a parallel-aware executor
node can use and recycle arbitrary amounts of shared memory with an
allocate/free interface.

Motivating use cases include shared bitmaps and shared hash tables
(patches to follow).

Currently, this doesn't mean you don't also need the existing DSM
segment. In order share data structures in the DSA area, you need a
way to exchange pointers to find them, and the existing segment + TOC
mechanism is ideal for that.

One obvious problem is that this patch results in at least *two* DSM
segments being created for every parallel query execution: the main
segment used for parallel execution, and then the initial segment
managed by the DSA area. One thought is that DSA areas are the more
general mechanism, so perhaps we should figure out how to store
contents of the existing segment in it. The TOC interface would need
a few tweaks to be able to live in memory allocated with dsa_allocate,
and they we'd need to share that address with other backends so that
they could find it (cf the current approach of finding the TOC at the
start of the segment). I haven't prototyped that yet. That'd involve
changing the wording "InitializeDSM" that appears in various places
including the FDW API, which has been putting me off...

This patch depends on dsa-v2.patch[1].

[1] https://www.postgresql.org/message-id/flat/CAEepm%3D1z5WLuNoJ80PaCvz6EtG9dN0j-KuHcHtU6QEfcPP5-qA%40mail.gmail.com

--
Thomas Munro
http://www.enterprisedb.com

Attachment Content-Type Size
dsa-area-for-executor-v1.patch application/octet-stream 5.8 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Munro 2016-10-04 21:40:45 Hash tables in dynamic shared memory
Previous Message Thomas Munro 2016-10-04 21:30:48 Re: Dynamic shared memory areas