Support Parallel Query Execution in Executor

From: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Support Parallel Query Execution in Executor
Date: 2006-04-06 10:28:33
Message-ID: e12qms$2s6q$1@news.hub.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

I have written some experimental code of doing master-slave seqscan in
PostgreSQL. During the work, I feel we had enough infrastructure to support
parallel query execution.

What I did is adding a new node PARA and plug it above the node that we want
to execute in parallel. In this stage, a PARA node is just a SeqScan node,
which is:

typedef struct Para
{
/* TODO: add a union to put all nodes supporting parallism here */
SeqScan scan;

/* Split / Merge / Redistribute */
ParaType type;

/* TODO: other possible parameters */
} Para;

At the execution, the master (the process who receives the query) will wake
up a slave process (an idle ordinary backend) and the slave will pass the
scan results to the master via a shared memory communication-buffer. In
details, the execution is like this:

Master process:
1. PARA init: wake up a slave, pass the queryTree and outerPlan(planTree) to
it by nodeToString();
2. PARA exec:
get an item from the communication-buffer;
if item is a valid tuple
return item;
else
handle other types of item; /* execution done/error */
3. PARA end: do some cleanup.

As we can see from PARA init stage, with even the most simple PARA node, it
is easy to support inter-node parallism.

Slave process (use similar code for autovacuum process):
1. Get queryTree and planTree;
2. Redirect the destReceiver to the communication-buffer;
3. Encapsulate them in an executor and run;

The query plan is like this:
TEST=# explain select max(a), max(b) from t;
QUERY PLAN
----------------------------------------------------------------------
Aggregate (cost=7269.01..7269.02 rows=1 width=53)
-> Para [Split = 1] (cost=10.00..5879.00 rows=278000 width=53)
-> Seq Scan on T (cost=0.00..5879.00 rows=278000 width=53)
(3 rows)

There are some problems I haven't addressed yet. The most difficult one for
me is the xid assignment: master and slaves should see an identical view,
and the key is the xid. I am not sure the correct solution of this problem.
We may use the same xid or use a continuous portion of xids for master and
slaves. There are other problems like the login problem (the master and
slaves should be acting as the same user), the elog message passing etc are
also important but I think we are able to handle them without any problem.

I haven't touched the most difficult part, the parallel query optimizer. But
thanks to the two-phase parallel optimization technique, this part can be
treated as the geqo optimizer, without enough evidence, we don't enable
parallel query execution.

Is there any show-stop reasons of not doing this?

Regards,
Qingqing

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-04-06 10:41:13 Re: Support Parallel Query Execution in Executor
Previous Message Horváth Sándor 2006-04-06 09:41:33 Re: commit callback, request, SOLVED

Browse pgsql-patches by date

  From Date Subject
Next Message Martijn van Oosterhout 2006-04-06 10:41:13 Re: Support Parallel Query Execution in Executor
Previous Message Magnus Hagander 2006-04-05 15:01:45 Re: pgstat: delayed write of stats file