Re: Parallel Append implementation

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>, Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Append implementation
Date: 2017-09-30 15:55:44
Message-ID: CA+Tgmob-ZrmgSWaojtXWtvFvYgo9uLdJRUUtiL7y6wHe2f9SZA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Sep 30, 2017 at 12:20 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Okay, but the point is whether it will make any difference
> practically. Let us try to see with an example, consider there are
> two children (just taking two for simplicity, we can extend it to
> many) and first having 1000 pages to scan and second having 900 pages
> to scan, then it might not make much difference which child plan
> leader chooses. Now, it might matter if the first child relation has
> 1000 pages to scan and second has just 1 page to scan, but not sure
> how much difference will it be in practice considering that is almost
> the maximum possible theoretical difference between two non-partial
> paths (if we have pages greater than 1024 pages
> (min_parallel_table_scan_size) then it will have a partial path).

But that's comparing two non-partial paths for the same relation --
the point here is to compare across relations. Also keep in mind
scenarios like this:

SELECT ... FROM relation UNION ALL SELECT ... FROM generate_series(...);

>> It's a lot fuzzier what is best when there are only partial plans.
>>
>
> The point that bothers me a bit is whether it is a clear win if we
> allow the leader to choose a different strategy to pick the paths or
> is this just our theoretical assumption. Basically, I think the patch
> will become simpler if pick some simple strategy to choose paths.

Well, that's true, but is it really that much complexity?

And I actually don't see how this is very debatable. If the only
paths that are reasonably cheap are GIN index scans, then the only
strategy is to dole them out across the processes you've got. Giving
the leader the cheapest one seems to be to be clearly smarter than any
other strategy. Am I missing something?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2017-09-30 16:03:57 Re: 64-bit queryId?
Previous Message Andres Freund 2017-09-30 15:55:14 Re: 64-bit queryId?