Re: CREATE TABLE with parallel workers, 10.0?

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Joshua Chamberlain <josh(at)zephyri(dot)co>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE TABLE with parallel workers, 10.0?
Date: 2017-02-15 13:48:44
Message-ID: CA+TgmoZC5ft_t9uQWSO5_1vU6H8oVyD=zyuLvRnJqTN==fvnhg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain <josh(at)zephyri(dot)co> wrote:
> Hello,
>
> (I'm posting to hackers since I got no response on the general list.)
>
> I use Postgres + PostGIS quite heavily, and recently have been taking full
> advantage of the new parallelism in 9.6. I'm now running queries in a few
> hours that would otherwise take more than a day.
>
> However, parallelism is disabled for all queries that perform writes (as
> documented). I would normally run "CREATE TABLE AS [some super-expensive
> query]", but since that can't use parallelism I'm using the \o option in
> psql, creating the table separately, and then \copy-ing in the results. That
> works, but "CREATE TABLE AS" would be more convenient.
>
> Are there plans in 10.0 to allow parallelism in queries that write, or at
> least in "CREATE TABLE AS" queries? (Support in materialized views would be
> great, too!)

Somebody else asked me about this in private email. Nobody at
EnterpriseDB is working on this right now, and I don't think anybody
else is working on it either. There are several ways to do it, but
none of them are entirely easy and the ones likely to perform better
are less easy.

I think that what we need to do to make this work is come up with a
way to fix the interaction between group locking (which allows
multiple processes to hold mutually conflicting locks at the same time
if they are in a parallel group) and relation extension (which uses
heavyweight locking to prevent multiple processes from trying to
extend the same relation at the same time). I think that perhaps the
right way to solve that problem is to come up with a way of providing
mutual exclusion around relation extension that doesn't need the
heavyweight lock manager. Relation extension locks don't need
multiple lock modes or deadlock detection or any of the other
frammishes that the heavyweight lock manager provides, but they do
need to be fast, which the heavyweight lock manager isn't especially
good at. So moving this out of the heavyweight lock manager might be
a way to solve two problems at once.

There's also a hazard related to GIN indexes since
e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use
of Page locks, which have a similar kind of problem. We got rid of
the major existing use of page locks in
6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
hash indexes, and I was kind of hoping they could go away altogether,
but we can't do that as long as GIN is using them.

Anyway, if we solve those problems, we can allow inserts (not updates
or deletes, those have other problems, principally relating to combo
CIDs) in parallel mode, which would make it possible to allow the
kinds of things you are asking about here. Then you could fix things
so that each worker generates a subset of the tuples and inserts the
ones it generates. You'd end up with a parallel plan but no Gather
node! The workers could feed tuples directly to a suitable
DestReceiver, which would be really spiffy.

The other way of fixing this problem is to have each worker generate a
subset of the tuples and funnel them all back to the leader through a
Gather node; the leader then does all the inserts. That avoids having
to solve the problems mentioned above, but it probably doesn't perform
nearly as well.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2017-02-15 13:58:01 Re: SERIALIZABLE with parallel query
Previous Message Robert Haas 2017-02-15 13:10:01 Re: Parallel Append implementation