From: | Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> |
---|---|

To: | Robert Haas <robertmhaas(at)gmail(dot)com> |

Cc: | Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org> |

Subject: | Re: Parallel Append implementation |

Date: | 2017-03-23 19:08:49 |

Message-ID: | CAJ3gD9fbmfr2uCT4Vr_pAtZZebsgaMWw2rP6xLgjHRKWebdRSA@mail.gmail.com |

Views: | Raw Message | Whole Thread | Download mbox |

Thread: | |

Lists: | pgsql-hackers |

On 23 March 2017 at 16:26, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:

> On 23 March 2017 at 05:55, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

>> On Wed, Mar 22, 2017 at 4:49 AM, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>

wrote:

>>> Attached is the updated patch that handles the changes for all the

>>> comments except the cost changes part. Details about the specific

>>> changes are after the cost-related points discussed below.

>>>

>>> For non-partial paths, I was checking following 3 options :

>>>

>>> Option 1. Just take the sum of total non-partial child costs and

>>> divide it by number of workers. It seems to be getting close to the

>>> actual cost.

>>

>> If the costs for all children are about equal, then that works fine.

>> But when they are very unequal, then it's highly misleading.

>>

>>> Option 2. Calculate exact cost by an algorithm which I mentioned

>>> before, which is pasted below for reference :

>>> Persubpath cost : 20 16 10 8 3 1, with 3 workers.

>>> After 10 time units (this is minimum of first 3 i.e. 20, 16, 10), the

>>> times remaining are :

>>> 10 6 0 8 3 1

>>> After 6 units (minimum of 10, 06, 08), the times remaining are :

>>> 4 0 0 2 3 1

>>> After 2 units (minimum of 4, 2, 3), the times remaining are :

>>> 2 0 0 0 1 1

>>> After 1 units (minimum of 2, 1, 1), the times remaining are :

>>> 1 0 0 0 0 0

>>> After 1 units (minimum of 1, 0 , 0), the times remaining are :

>>> 0 0 0 0 0 0

>>> Now add up above time chunks : 10 + 6 + 2 + 1 + 1 = 20

>>

>

>> This gives the same answer as what I was proposing

>

> Ah I see.

>

>> but I believe it's more complicated to compute.

> Yes a bit, particularly because in my algorithm, I would have to do

> 'n' subtractions each time, in case of 'n' workers. But it looked more

> natural because it follows exactly the way we manually calculate.

>

>> The way my proposal would work in this

>> case is that we would start with an array C[3] (since there are three

>> workers], with all entries 0. Logically C[i] represents the amount of

>> work to be performed by worker i. We add each path in turn to the

>> worker whose array entry is currently smallest; in the case of a tie,

>> just pick the first such entry.

>>

>> So in your example we do this:

>>

>> C[0] += 20;

>> C[1] += 16;

>> C[2] += 10;

>> /* C[2] is smaller than C[0] or C[1] at this point, so we add the next

>> path to C[2] */

>> C[2] += 8;

>> /* after the previous line, C[1] is now the smallest, so add to that

>> entry next */

>> C[1] += 3;

>> /* now we've got C[0] = 20, C[1] = 19, C[2] = 18, so add to C[2] */

>> C[2] += 1;

>> /* final result: C[0] = 20, C[1] = 19, C[2] = 19 */

>>

>> Now we just take the highest entry that appears in any array, which in

>> this case is C[0], as the total cost.

>

> Wow. The way your final result exactly tallies with my algorithm

> result is very interesting. This looks like some maths or computer

> science theory that I am not aware.

>

> I am currently coding the algorithm using your method.

While I was coding this, I was considering if Path->rows also should

be calculated similar to total cost for non-partial subpath and total

cost for partial subpaths. I think for rows, we can just take

total_rows divided by workers for non-partial paths, and this

approximation should suffice. It looks odd that it be treated with the

same algorithm we chose for total cost for non-partial paths.

Meanwhile, attached is a WIP patch v10. The only change in this patch

w.r.t. the last patch (v9) is that this one has a new function defined

append_nonpartial_cost(). Just sending this to show how the algorithm

looks like; haven't yet called it.

Attachment | Content-Type | Size |
---|---|---|

ParallelAppend_v10_WIP.patch | application/octet-stream | 56.6 KB |

- Re: Parallel Append implementation at 2017-03-23 10:56:26 from Amit Khandekar

- Re: Parallel Append implementation at 2017-03-24 07:41:54 from Rajkumar Raghuwanshi
- Re: Parallel Append implementation at 2017-03-24 16:02:57 from Amit Khandekar

From | Date | Subject | |
---|---|---|---|

Next Message | Fabrízio de Royes Mello | 2017-03-23 19:14:58 | Re: [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM |

Previous Message | Alvaro Herrera | 2017-03-23 18:58:49 | Re: [PATCH] Move all am-related reloption code into src/backend/access/[am-name] and get rid of relopt_kind for custom AM |