Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-09 14:38:57
Message-ID: CAA4eK1L0dk9D3hARoAb84v2pGvUw4B5YoS4x18ORQREwR+1VCg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Dec 19, 2014 at 7:57 PM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
>
>
> There's certainly documentation available from the other RDBMS' which
> already support parallel query, as one source. Other academic papers
> exist (and once you've linked into one, the references and prior work
> helps bring in others). Sadly, I don't currently have ACM access (might
> have to change that..), but there are publicly available papers also,

I have gone through couple of papers and what some other databases
do in case of parallel sequential scan and here is brief summarization
of same and how I am planning to handle in the patch:

Costing:
In one of the paper's [1] suggested by you, below is the summarisation:
a. Startup costs are negligible if processes can be reused
rather than created afresh.
b. Communication cost consists of the CPU cost of sending
and receiving messages.
c. Communication costs can exceed the cost of operators such
as scanning, joining or grouping
These findings lead to the important conclusion that
Query optimization should be concerned with communication costs
but not with startup costs.

In our case as currently we don't have a mechanism to reuse parallel
workers, so we need to account for that cost as well. So based on that,
I am planing to add three new parameters cpu_tuple_comm_cost,
parallel_setup_cost, parallel_startup_cost
* cpu_tuple_comm_cost - Cost of CPU time to pass a tuple from worker
to master backend with default value
DEFAULT_CPU_TUPLE_COMM_COST as 0.1, this will be multiplied
with tuples expected to be selected
* parallel_setup_cost - Cost of setting up shared memory for parallelism
with default value as 100.0
* parallel_startup_cost - Cost of starting up parallel workers with
default
value as 1000.0 multiplied by number of workers decided for scan.

I will do some experiments to finalise the default values, but in general,
I feel developing cost model on above parameters is good.

Execution:
Most other databases does partition level scan for partition on
different disks by each individual parallel worker. However,
it seems amazon dynamodb [2] also works on something
similar to what I have used in patch which means on fixed
blocks. I think this kind of strategy seems better than dividing
the blocks at runtime because dividing randomly the blocks
among workers could lead to random scan for a parallel
sequential scan.
Also I find in whatever I have read (Oracle, dynamodb) that most
databases divide work among workers and master backend acts
as coordinator, atleast that's what I could understand.

Let me know your opinion about the same?

I am planning to proceed with above ideas to strengthen the patch
in absence of any objection or better ideas.

[1] : http://i.stanford.edu/pub/cstr/reports/cs/tr/96/1570/CS-TR-96-1570.pdf
[2] :
http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/QueryAndScan.html#QueryAndScanParallelScan

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-01-09 14:53:27 Re: [PATCH] server_version_num should be GUC_REPORT
Previous Message Heikki Linnakangas 2015-01-09 14:15:09 Translating xlogreader.c