Re: Parallel Append can break run-time partition pruning

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Parallel Append can break run-time partition pruning
Date: 2020-04-28 13:31:15
Message-ID: CA+TgmoarD5+pDLLg4Lh73b2xXO4XjBbZbsBnCM-OZ3U8+6+hzQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 25, 2020 at 7:25 AM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> I looked at this again and I don't think what I've got is right.

Apart from the considerations which you raised, I think there's also a
costing issue here. For instance, suppose we have an Append with three
children. Two of them have partial paths which will complete after
consuming 1 second of CPU time, and using a partial path is
unquestionably the best strategy for those children. The third one has
a partial path which will complete after using 10 seconds of CPU time
and a non-partial path which will complete after using 8 seconds of
CPU time. What strategy is best? If we pick the non-partial path, the
time that the whole thing takes to complete will be limited by the
non-partial path, which, since it cannot use parallelism, will take 8
seconds of wall clock time. If we pick the partial path, we have a
total of 12 seconds of CPU time which we can finish in 6 wall clock
seconds with 2 participants, 4 seconds with 3 participants, or 3
seconds with 4 participants. This is clearly a lot better.

Incidentally, the planner knows about the fact that you can't finish
an Append faster than you can finish its non-partial participants. See
append_nonpartial_cost().

Now, I don't think anything necessarily gets broken here by your patch
as written. Planner decisions are made on estimated cost, which is a
proxy for wall clock time, not CPU time. Right now, we only prefer the
non-partial path if we think it can be executed in less time by ONE
worker than the partial path could be executed by ALL workers:

/*
* Either we've got only a non-partial
path, or we think that
* a single backend can execute the
best non-partial path
* faster than all the parallel
backends working together can
* execute the best partial path.

So, in the above example, we wouldn't even consider the non-partial
path. Even say we just have 2 workers. The partial path will have a
cost of 6, which is less than 8, so it gets rejected. But as the
comment goes on to note:

* It might make sense to be more
aggressive here. Even if
* the best non-partial path is more
expensive than the best
* partial path, it could still be
better to choose the
* non-partial path if there are
several such paths that can
* be given to different workers. For
now, we don't try to
* figure that out.

This kind of strategy is particularly appealing when the number of
Append children is large compared to the number of workers. For
instance, if you have an Append with 100 children and you are planning
with 4 workers, it's probably a pretty good idea to be very aggressive
about picking the path that uses the least resources, which the
current algorithm would not do. You're unlikely to end up with idle
workers, because you have so many plans to which they can be
allocated. However, it's possible: it could be that there's one child
which is way bigger than all the others and the really important thing
is to get a partial path for that child, so that it can be attacked in
parallel, even if that means that overall resource consumption is
higher. As the number of children decreases relative to the number of
workers, having partial paths becomes increasingly appealing. To take
a degenerate case, suppose you have 4 workers but only 2 children.
Partial paths should look really appealing, because the alternative is
leaving workers unused.

I *think* your patch is based around the idea of merging the
turn-the-append-of-partial-paths-into-a-parallel-append case with the
consider-a-mix-of-partial-and-nonpartial-paths-for-parallel-append
case. That seems possible to do given that the current heuristic is to
compare the raw path costs, but I think that it wouldn't work if we
wanted to be more aggressive about considering the mixed strategy and
letting the costing machinery sort out which way is better.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2020-04-28 13:43:43 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays
Previous Message Tomas Vondra 2020-04-28 13:26:26 Re: Binary search in ScalarArrayOpExpr for OR'd constant arrays