| From: | Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> | 
|---|---|
| To: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> | 
| Cc: | Robert Haas <robertmhaas(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> | 
| Subject: | Re: Parallel Append implementation | 
| Date: | 2017-09-30 13:51:09 | 
| Message-ID: | CAA4eK1KQm=osRdz499C_J2sm6pf14MbhF6bYvLyA7AxQZZE8nQ@mail.gmail.com | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Wed, Sep 20, 2017 at 10:59 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 16 September 2017 at 10:42, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> At a broader level, the idea is good, but I think it won't turn out
>> exactly like that considering your below paragraph which indicates
>> that it is okay if leader picks a partial path that is costly among
>> other partial paths as a leader won't be locked with that.
>> Considering this is a good design for parallel append, the question is
>> do we really need worker and leader to follow separate strategy for
>> choosing next path.  I think the patch will be simpler if we can come
>> up with a way for the worker and leader to use the same strategy to
>> pick next path to process.  How about we arrange the list of paths
>> such that first, all partial paths will be there and then non-partial
>> paths and probably both in decreasing order of cost.  Now, both leader
>> and worker can start from the beginning of the list. In most cases,
>> the leader will start at the first partial path and will only ever
>> need to scan non-partial path if there is no other partial path left.
>> This is not bulletproof as it is possible that some worker starts
>> before leader in which case leader might scan non-partial path before
>> all partial paths are finished, but I think we can avoid that as well
>> if we are too worried about such cases.
>
> If there are no partial subpaths, then again the leader is likely to
> take up the expensive subpath. And this scenario would not be
> uncommon.
>
While thinking about how common the case of no partial subpaths would
be, it occurred to me that as of now we always create a partial path
for the inheritance child if it is parallel-safe and the user has not
explicitly set the value of parallel_workers to zero (refer
compute_parallel_worker).  So, unless you are planning to change that,
I think it will be quite uncommon to have no partial subpaths.
Few nitpicks in your latest patch:
1.
@@ -298,6 +366,292 @@ ExecReScanAppend(AppendState *node)
  if (subnode->chgParam == NULL)
  ExecReScan(subnode);
  }
+
Looks like a spurious line.
2.
@@ -1285,7 +1291,11 @@ add_paths_to_append_rel(PlannerInfo *root,
RelOptInfo *rel,
..
+ if (chosen_path && chosen_path != cheapest_partial_path)
+ pa_all_partial_subpaths = false;
It will keep on setting pa_all_partial_subpaths as false for
non-partial paths which don't seem to be the purpose of this variable.
I think you want it to be set even when there is one non-partial path,
so isn't it better to write as below or something similar:
if (pa_nonpartial_subpaths && pa_all_partial_subpaths)
pa_all_partial_subpaths = false;
-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Robert Haas | 2017-09-30 14:34:25 | Re: 64-bit queryId? | 
| Previous Message | Devrim Gündüz | 2017-09-30 12:41:53 | Re: extension build issue with PostgreSQL 10 on Centos6 |