Skip site navigation (1) Skip section navigation (2)

Re: parallelizing subplan execution (was: explain and PARAM_EXEC)

From: Mark Wong <markwkm(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Dimitri Fontaine <dfontaine(at)hi-media(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: parallelizing subplan execution (was: explain and PARAM_EXEC)
Date: 2010-07-01 03:24:18
Message-ID: AANLkTimj1c_8djn8VWzqgNPap9rT4LfOwl3TFIJSkEnH@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-hackers
On Sat, Jun 26, 2010 at 6:01 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> On Fri, Jun 25, 2010 at 10:47 PM, Mark Wong <markwkm(at)gmail(dot)com> wrote:
>> http://pages.cs.wisc.edu/~dewitt/includes/publications.html
>>
>> Some of these papers aren't the type of parallelism we're talking
>> about here, but the ones that I think are relevant talk mostly about
>> parallelizing hash based joins.  I think we might be lacking an
>> operator or two though in order to do some of these things.
>
> This part (from the first paper linked on that page) is not terribly
> encouraging.
>
> "Current database query optimizers do not consider all possible plans
> when optimizing a relational query. While cost models for relational
> queries running on a single processor are now well-understood
> [SELI79], they still depend on cost estimators that are a guess at
> best. Some dynamically select from among several plans at run time
> depending on, for example, the amount of physical memory actually
> available and the cardinalities of the intermediate results [GRAE89].
> To date, no query optimizers consider all the parallel algorithms for
> each operator and all the query tree organizations. More work is
> needed in this area."
>
> The section (from that same paper) on parallelizing hash joins and
> merge-join-over-sort is interesting, and I can definitely imagine
> those techniques being a win for us.  But I'm not too sure how we'd
> know when to apply them - that is, what algorithm would the query
> optimizer use?  I'm sure we could come up with something, but I'd get
> a warmer, fuzzier feeling if we could implement the fruits of someone
> else's research rather than rolling our own.

I found another starting point for more papers here:

http://infolab.stanford.edu/joker/joqrs.html

The links on this page don't work anymore but many of these are easily
found by searching for the title.  I've only gone through some
abstracts so far, but it seems to me that they discuss some query
optimization techniques for parallel systems.

Regards,
Mark

In response to

pgsql-hackers by date

Next:From: Michael GlaesemannDate: 2010-07-01 03:39:42
Subject: Re: Additional startup logging
Previous:From: Takahiro ItagakiDate: 2010-07-01 02:43:51
Subject: Re: Additional startup logging

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group