Re: Parallel Seq Scan

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: José Luis Tallón <jltallon(at)adv-solutions(dot)net>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2014-12-05 18:57:34
Message-ID: 5482001E.20001@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 12/5/14, 9:08 AM, José Luis Tallón wrote:
>
> More over, when load goes up, the relative cost of parallel working should go up as well.
> Something like:
> p = number of cores
> l = 1min-load
>
> additional_cost = tuple estimate * cpu_tuple_cost * (l+1)/(c-1)
>
> (for c>1, of course)

...

> The parallel seq scan nodes are definitively the best approach for "parallel query", since the planner can optimize them based on cost.
> I'm wondering about the ability to modify the implementation of some methods themselves once at execution time: given a previously planned query, chances are that, at execution time (I'm specifically thinking about prepared statements here), a different implementation of the same "node" might be more suitable and could be used instead while the condition holds.

These comments got me wondering... would it be better to decide on parallelism during execution instead of at plan time? That would allow us to dynamically scale parallelism based on system load. If we don't even consider parallelism until we've pulled some number of tuples/pages from a relation, this would also eliminate all parallel overhead on small relations.
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2014-12-05 18:58:20 Re: Testing DDL deparsing support
Previous Message Stephen Frost 2014-12-05 18:37:59 Re: Role Attribute Bitmask Catalog Representation