Re: Parallel Append implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-03-16 20:07:34
Message-ID: CA+TgmoaLRtaWdJVHfhHej2s7w1spbr6gZiZXJrM5bsz1KQ54Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Mar 16, 2017 at 6:27 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> Attached is an updated patch v7, which does the above.

Some comments:

- You've added a GUC (which is good) but not documented it (which is
bad) or added it to postgresql.conf.sample (also bad).

- You've used a loop inside a spinlock-protected critical section,
which is against project policy. Use an LWLock; define and document a
new builtin tranche ID.

- The comment for pa_finished claims that it is the number of workers
executing the subplan, but it's a bool, not a count; I think this
comment is just out of date.

- paths_insert_sorted_by_cost() is a hand-coded insertion sort. Can't
we find a way to use qsort() for this instead of hand-coding a slower
algorithm? I think we could just create an array of the right length,
stick each path into it from add_paths_to_append_rel, and then qsort()
the array based on <is-partial, total-cost>. Then the result can be
turned into a list.

- Maybe the new helper functions in nodeAppend.c could get names
starting with exec_append_, to match the style of
exec_append_initialize_next().

- There's a superfluous whitespace change in add_paths_to_append_rel.

- The substantive changes in add_paths_to_append_rel don't look right
either. It's not clear why accumulate_partialappend_subpath is
getting called even in the non-enable_parallelappend case. I don't
think the logic for the case where we're not generating a parallel
append path needs to change at all.

- When parallel append is enabled, I think add_paths_to_append_rel
should still consider all the same paths that it does today, plus one
extra. The new path is a parallel append path where each subpath is
the cheapest subpath for that childrel, whether partial or
non-partial. If !enable_parallelappend, or if all of the cheapest
subpaths are partial, then skip this. (If all the cheapest subpaths
are non-partial, it's still potentially useful.) In other words,
don't skip consideration of parallel append just because you have a
partial path available for every child rel; it could be

- I think the way cost_append() works is not right. What you've got
assumes that you can just multiply the cost of a partial plan by the
parallel divisor to recover the total cost, which is not true because
we don't divide all elements of the plan cost by the parallel divisor
-- only the ones that seem like they should be divided. Also, it
could be smarter about what happens with the costs of non-partial
paths. I suggest the following algorithm instead.

1. Add up all the costs of the partial paths. Those contribute
directly to the final cost of the Append. This ignores the fact that
the Append may escalate the parallel degree, but I think we should
just ignore that problem for now, because we have no real way of
knowing what the impact of that is going to be.

2. Next, estimate the cost of the non-partial paths. To do this, make
an array of Cost of that length and initialize all the elements to
zero, then add the total cost of each non-partial plan in turn to the
element of the array with the smallest cost, and then take the maximum
of the array elements as the total cost of the non-partial plans. Add
this to the result from step 1 to get the total cost.

- In get_append_num_workers, instead of the complicated formula with
log() and 0.693, just add the list lengths and call fls() on the
result. Integer arithmetic FTW!

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nikolay Samokhvalov 2017-03-16 20:08:53 Re: ON CONFLICT with constraint name doesn't work
Previous Message Peter Geoghegan 2017-03-16 19:51:18 Re: ON CONFLICT with constraint name doesn't work