Re: Parallel Append implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-04-05 14:44:40
Message-ID: CA+TgmoaR9uMrV1S8Sy+OMv-Wu_GUeQaM0x-PgrhLP9b6Thw-3Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 4, 2017 at 4:13 PM, Andres Freund <andres(at)anarazel(dot)de> wrote:
> I'm quite unconvinced that just throwing a log() in there is the best
> way to combat that. Modeling the issue of starting more workers through
> tuple transfer, locking, startup overhead costing seems a better to me.

Knock yourself out. There's no doubt that the way the number of
parallel workers is computed is pretty stupid right now, and it
obviously needs to get a lot smarter before we can consider doing
things like throwing 40 workers at a query. If you throw 2 or 4
workers at a query and it turns out that it doesn't help much, that's
sad, but if you throw 40 workers at a query and it turns out that it
doesn't help much, or even regresses, that's a lot sadder. The
existing system does try to model startup and tuple transfer overhead
during costing, but only as a way of comparing parallel plans to each
other or to non-parallel plans, not to work out the right number of
workers. It also does not model contention, which it absolutely needs
to do. I was kind of hoping that once the first version of parallel
query was committed, other developers who care about the query planner
would be motivated to improve some of this stuff, but so far that
hasn't really happened. This release adds a decent number of new
execution capabilities, and there is a lot more work to be done there,
but without some serious work on the planner end of things I fear
we're never going to be able to get more than ~4x speedup out of
parallel query, because we're just too dumb to know how many workers
we really ought to be using.

That having been said, I completely and emphatically disagree that
this patch ought to be required to be an order-of-magnitude smarter
than the existing logic in order to get committed. There are four
main things that this patch can hope to accomplish:

1. If we've got an Append node with children that have a non-zero
startup cost, it is currently pretty much guaranteed that every worker
will pay the startup cost for every child. With Parallel Append, we
can spread out the workers across the plans, and once a plan has been
finished by however many workers it got, other workers can ignore it,
which means that its startup cost need not be paid by those workers.
This case will arise a lot more frequently once we have partition-wise
join.

2. When the Append node's children are partial plans, spreading out
the workers reduces contention for whatever locks those workers use to
coordinate access to shared data.

3. If the Append node represents a scan of a partitioned table, and
the partitions are on different tablespaces (or there's just enough
I/O bandwidth available in a single tablespace to read more than one
of them at once without slowing things down), then spreading out the
work gives us I/O parallelism. This is an area where some
experimentation and benchmarking is needed, because there is a
possibility of regressions if we run several sequential scans on the
same spindle in parallel instead of consecutively. We might need to
add some logic to try to avoid this, but it's not clear how that logic
should work.

4. If the Append node is derived from a UNION ALL query, we can run
different branches in different processes even if the plans are not
themselves able to be parallelized. This was proposed by Stephen
among others as an "easy" case for parallelism, which was maybe a tad
optimistic, but it's sad that we're going to release v10 without
having done anything about it.

All of those things (except possibly #3) are wins over the status quo
even if the way we choose the number of workers is still pretty dumb.
It shouldn't get away with being dumber than what we've already got,
but it shouldn't be radically smarter - or even just radically
different because, if it is, then the results you get when you query a
partitioned table will be very different from what you get when you
query an partitioned table, which is not sensible. I very much agree
that doing something smarter than log-scaling on the number of workers
is an a good project for somebody to do, but it's not *this* project.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-04-05 14:45:06 Re: partitioned tables and contrib/sepgsql
Previous Message Stephen Frost 2017-04-05 14:40:41 Re: Rewriting the test of pg_upgrade as a TAP test