Re: Support Parallel Query Execution in Executor

From: "Mike Rylander" <mrylander(at)gmail(dot)com>
To: "Qingqing Zhou" <zhouqq(at)cs(dot)toronto(dot)edu>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Support Parallel Query Execution in Executor
Date: 2006-04-07 12:59:46
Message-ID: b918cf3d0604070559r5d139018h3693ec95ac42092c@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

On 4/6/06, Qingqing Zhou <zhouqq(at)cs(dot)toronto(dot)edu> wrote:
>
> ""Jonah H. Harris"" <jonah(dot)harris(at)gmail(dot)com> wrote
> >
> > Great work! I had looked into this a little bit and came to the same
> > ideas/problems you did, but none of them seemed insurmountable at all.
> > I'd be interested in working with you on this if you'd like.
> >

First, I want to second Jonah's enthusiasm. This is very exciting!

>
> Yes, I am happy to work with anyone on the topic. The plan in mind is like
> this:
> (1) stable the master-slave seqscan: solve all the problems left;
> (2) parallize the seqscan: AFAICS, this should not very difficult based on
> 1, may only need some scan portition assignment;

This is really only a gut feeling for me (it can't be otherwise, since
we can't yet test), but I think parallelizing a single seqscan is
pretty much guaranteed to do nothing, because seqscans, especially on
large tables, are IO bound.

There was plan some time ago (during 8.0 beta, I think) to allow
multiple seqscans from different queries to join each other, such that
scans that begin later start scanning the table at the point, or just
behind the point, that the first running scan is already at. That
plan would reduce IO contention, and buffer and OS cache thrashing, by
having multiple readers pull from the same hose.

I can't see how asking for more than one stream from the same file
would do anything but increase both cache thrashing and IO bandwidth
contention. Am I missing something here?

> (3) add an indexscan or other one or two node type to master-slave
> solution: this is in order to make the framework extensible;
> (4) parallize these node - this will be a big chunk of job;

Now that could be a _big_ win! Especially if tablespaces are used to
balance commonly combined tables and indexes.

> (5) add a two-phase optimization to the server - we have to consider the
> partitioned table in this stage, yet another big chunk of job;
>

Same here. This would be a place where parallel seqscans of different
tables (instead of multi-headed scan of one table) could buy you a
lot, especially with proper tablespace use.

Thanks again, Qingqing, for the work on this. I'm very excited about
where this could go. :)

> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Mike Rylander
mrylander(at)gmail(dot)com
GPLS -- PINES Development
Database Developer
http://open-ils.org

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adrian Maier 2006-04-07 13:03:29 Re: Strange results from to_timestamp
Previous Message Dave Page 2006-04-07 12:13:48 Re: Windows installer bugs (was: [BUGS] BUG #2374: Installation Error)

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2006-04-07 14:18:10 Re: Bug in window xp
Previous Message Magnus Hagander 2006-04-07 09:13:47 Re: Bug in window xp