Re: CREATE TABLE with parallel workers, 10.0?

From: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Joshua Chamberlain <josh(at)zephyri(dot)co>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: CREATE TABLE with parallel workers, 10.0?
Date: 2017-02-16 00:49:21
Message-ID: CAJrrPGeT3-=nmZUF4jFCgCi629Kx_XXad2Zx5yOjhJ4m8gFFOw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Feb 16, 2017 at 12:48 AM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Wed, Feb 15, 2017 at 12:24 AM, Joshua Chamberlain <josh(at)zephyri(dot)co>
> wrote:
> > Hello,
> >
> > (I'm posting to hackers since I got no response on the general list.)
> >
> > I use Postgres + PostGIS quite heavily, and recently have been taking
> full
> > advantage of the new parallelism in 9.6. I'm now running queries in a few
> > hours that would otherwise take more than a day.
> >
> > However, parallelism is disabled for all queries that perform writes (as
> > documented). I would normally run "CREATE TABLE AS [some super-expensive
> > query]", but since that can't use parallelism I'm using the \o option in
> > psql, creating the table separately, and then \copy-ing in the results.
> That
> > works, but "CREATE TABLE AS" would be more convenient.
> >
> > Are there plans in 10.0 to allow parallelism in queries that write, or at
> > least in "CREATE TABLE AS" queries? (Support in materialized views would
> be
> > great, too!)
>
> Somebody else asked me about this in private email. Nobody at
> EnterpriseDB is working on this right now, and I don't think anybody
> else is working on it either. There are several ways to do it, but
> none of them are entirely easy and the ones likely to perform better
> are less easy.
>
> I think that what we need to do to make this work is come up with a
> way to fix the interaction between group locking (which allows
> multiple processes to hold mutually conflicting locks at the same time
> if they are in a parallel group) and relation extension (which uses
> heavyweight locking to prevent multiple processes from trying to
> extend the same relation at the same time). I think that perhaps the
> right way to solve that problem is to come up with a way of providing
> mutual exclusion around relation extension that doesn't need the
> heavyweight lock manager. Relation extension locks don't need
> multiple lock modes or deadlock detection or any of the other
> frammishes that the heavyweight lock manager provides, but they do
> need to be fast, which the heavyweight lock manager isn't especially
> good at. So moving this out of the heavyweight lock manager might be
> a way to solve two problems at once.
>
> There's also a hazard related to GIN indexes since
> e2c79e14d998cd31f860854bc9210b37b457bb01, which introduced a new use
> of Page locks, which have a similar kind of problem. We got rid of
> the major existing use of page locks in
> 6d46f4783efe457f74816a75173eb23ed8930020, which extirpated them from
> hash indexes, and I was kind of hoping they could go away altogether,
> but we can't do that as long as GIN is using them.
>
> Anyway, if we solve those problems, we can allow inserts (not updates
> or deletes, those have other problems, principally relating to combo
> CIDs) in parallel mode, which would make it possible to allow the
> kinds of things you are asking about here. Then you could fix things
> so that each worker generates a subset of the tuples and inserts the
> ones it generates. You'd end up with a parallel plan but no Gather
> node! The workers could feed tuples directly to a suitable
> DestReceiver, which would be really spiffy.
>
> The other way of fixing this problem is to have each worker generate a
> subset of the tuples and funnel them all back to the leader through a
> Gather node; the leader then does all the inserts. That avoids having
> to solve the problems mentioned above, but it probably doesn't perform
> nearly as well.
>

How about supporting something like, backend does the write operations
and whereas the worker will produce the results. This way it may not produce
good performance for all the cases compared to do the writer operation by
all parallel workers, but this may be useful for some scenarios like;
CREATE MATERIALIZED VIEW and etc.

Following are the explain plan with minimal changes in the code to allow
write operations. I didn't verified all the scenarios. How about supporting
writer operations as below and then later enhance it to do the write
operations
by the parallel workers also?

POC patch is attached.

postgres=# explain create materialized view mat_view as select * from tbl
where f1 =10;
QUERY PLAN
------------------------------------------------------------------------
Gather (cost=1000.00..37458.43 rows=1 width=214)
Workers Planned: 2
-> Parallel Seq Scan on tbl (cost=0.00..36458.33 rows=1 width=214)
Filter: (f1 = 10)
(4 rows)

postgres=# explain insert into tbl select * from tbl where f1 = 10;
QUERY PLAN

------------------------------------------------------------------------------------
Insert on tbl (cost=1000.00..37458.43 rows=1 width=214)
-> Gather (cost=1000.00..37458.43 rows=1 width=214)
Workers Planned: 2
-> Parallel Seq Scan on tbl tbl_1 (cost=0.00..36458.33 rows=1
width=214)
Filter: (f1 = 10)
(5 rows)

postgres=# explain update tbl set f1 = 10 where f1 = 10;
QUERY PLAN

------------------------------------------------------------------------------
Update on tbl (cost=1000.00..37458.43 rows=1 width=220)
-> Gather (cost=1000.00..37458.43 rows=1 width=220)
Workers Planned: 2
-> Parallel Seq Scan on tbl (cost=0.00..36458.33 rows=1
width=220)
Filter: (f1 = 10)
(5 rows)

Regards,
Hari Babu
Fujitsu Australia

Attachment Content-Type Size
parallel_writer_poc_1.patch application/octet-stream 5.9 KB

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2017-02-16 01:13:13 Re: CREATE TABLE with parallel workers, 10.0?
Previous Message Amit Langote 2017-02-16 00:45:53 Re: AT detach partition is broken