Re: Creating a DSA area to provide work space for parallel execution

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>
Cc: Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Creating a DSA area to provide work space for parallel execution
Date: 2016-12-05 20:12:30
Message-ID: CA+TgmoZ_QnyPiUx_VHEqqLrW=pnt74QH3Z0rFfHjpbA5Dp5wgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Dec 1, 2016 at 6:35 AM, Thomas Munro
<thomas(dot)munro(at)enterprisedb(dot)com> wrote:
> On Sat, Nov 26, 2016 at 1:55 AM, Thomas Munro
> <thomas(dot)munro(at)enterprisedb(dot)com> wrote:
>> Here's a new version to apply on top of dsa-v7.patch.
>
> Here's a version to go with dsa-v8.patch.

Thomas has spent a fair amount of time beating me up off-list about
the fact that we have no way to recycle LWLock tranche IDs, and I've
spent a fair amount of time trying to defend that decision, but it's
really a problem here. This is all well and good the way it's written
provided that there's only one parallel context in existence at a
time, but should that number ever exceed 1, which it can, then the
tranche's array_base will be incorrect for LWLocks in one or the other
tranche.

That *almost* doesn't matter. If you're not compiling with dtrace or
LOCK_DEBUG or LWLOCK_STATS, you'll be fine. And if you are compiling
with one of those things, I believe the consequences will be no worse
than an occasional nonsensical lock ID. It's halfway tempting to just
accept that as a known wart, but, uggh, that sucks.

It's a bit hard to come up with a better alternative. We could set
aside a certain number of tranche IDs for parallel contexts, say 16.
Then as long as the same backend doesn't try to do create more than 16
parallel contexts at the same time, we'll be fine. And a backend
really shouldn't have more than 16 Gather nodes active at the same
time. But it could. In fact, even if the planner never created more
than one Gather node in the same plan (which it sometimes does), the
leader can have multiple parallel contexts active for different
queries at the same time. It could fire off a Gather and then later
some part of that plan that's running in the master could call a
function that tries to execute some OTHER query which also happens to
involve a Gather and then while the master is executing its part of
THAT query the same thing could happen all over again, so any
arbitrary limit we install here can be exceeded in, admittedly,
extremely contrived scenarios.

Moreover, the LWLock tranche system itself is limited to a 16-bit
integer for tranche IDs, so there can't be more than 65536 of those
over the lifetime of the cluster no matter what. Since
008608b9d51061b1f598c197477b3dc7be9c4a64, that limit is rather
pointless; as things stand today, we could change the tranche ID to a
32-bit integer without losing anything. But changing that might eat
up bit space that somebody wants to use later to solve some other
problem, and anyway by itself it doesn't fix anything. Also, it would
allow LWLockTrancheArray to get regrettably large.

Another alternative is to have the DSA system fix up the tranche base
address every time we enter a DSA function that might need to take a
lock. The DSA code never returns with any relevant locks held, so the
base address can't get obsoleted by some other DSA while a lock using
the old base address is still held. That's sort of ugly too, and it
adds a little overhead to fix a problem that will bite almost nobody,
but it's formally correct and seems fairly watertight. Basically each
DSA function that needs to take a lock would need to first do this:

area->lwlock_tranche.array_base = &area->control->pools[0];

Maybe that's not too bad... thoughts?

BTW, I just noticed that the dsa_area_control member called "lock" is
going to get a fairly random lock ID, based on the number of bytes by
which "lock" follows "pools". Wouldn't it be better to put all of the
LWLocks - both the general locks and the per-pool locks - in one
array, probably with the general lock first, so that T_ID() will do
the right thing for such locks?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2016-12-05 20:14:01 Re: [COMMITTERS] pgsql: Introduce dynamic shared memory areas.
Previous Message Tom Lane 2016-12-05 20:08:58 Typmod associated with multi-row VALUES constructs