Skip site navigation (1) Skip section navigation (2)

Re: Parallel query execution

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel query execution
Date: 2013-01-15 23:08:47
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
> On 16/01/13 11:14, Bruce Momjian wrote:
>     I mentioned last year that I wanted to start working on parallelism:
>     Years ago I added thread-safety to libpq.  Recently I added two parallel
>     execution paths to pg_upgrade.  The first parallel path allows execution
>     of external binaries pg_dump and psql (to restore).  The second parallel
>     path does copy/link by calling fork/thread-safe C functions.  I was able
>     to do each in 2-3 days.
>     I believe it is time to start adding parallel execution to the backend.
>     We already have some parallelism in the backend:
>     effective_io_concurrency and helper processes.  I think it is time we
>     start to consider additional options.
>     Parallelism isn't going to help all queries, in fact it might be just a
>     small subset, but it will be the larger queries.  The pg_upgrade
>     parallelism only helps clusters with multiple databases or tablespaces,
>     but the improvements are significant.
>     I have summarized my ideas by updating our Parallel Query Execution wiki
>     page:
>     Please consider updating the page yourself or posting your ideas to this
>     thread.  Thanks.
> Hmm...
> How about being aware of multiple spindles - so if the requested data covers
> multiple spindles, then data could be extracted in parallel.  This may, or may
> not, involve multiple I/O channels?

Well, we usually label these as tablespaces.  I don't know if
spindle-level is a reasonable level to add.

> On large multiple processor machines, there are different blocks of memory that
> might be accessed at different speeds depending on the processor.  Possibly a
> mechanism could be used to split a transaction over multiple processors to
> ensure the fastest memory is used?

That seems too far-out for an initial approach.

> Once a selection of rows has been made, then if there is a lot of reformatting
> going on, then could this be done in parallel?  I can of think of 2 very
> simplistic strategies: (A) use a different processor core for each column, or
> (B) farm out sets of rows to different cores.  I am sure in reality, there are
> more subtleties and aspects of both the strategies will be used in a hybrid
> fashion along with other approaches.

Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.

> I expect that before any parallel algorithm is invoked, then some sort of
> threshold needs to be exceeded to make it worth while.  Different aspects of
> the parallel algorithm may have their own thresholds.  It may not be worth
> applying a parallel algorithm for 10 rows from a simple table, but selecting
> 10,000 records from multiple tables each over 10 million rows using joins may
> benefit for more extreme parallelism.

Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.

> I expect that UNIONs, as well as the processing of partitioned tables, may be
> amenable to parallel processing.

Interesting idea on UNION.

  Bruce Momjian  <bruce(at)momjian(dot)us>

  + It's impossible for everything to be true. +

In response to

pgsql-hackers by date

Next:From: Stephen FrostDate: 2013-01-15 23:15:57
Subject: Re: Parallel query execution
Previous:From: Stephen FrostDate: 2013-01-15 23:07:01

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group