Skip site navigation (1) Skip section navigation (2)

Re: Parallel query execution

From: Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel query execution
Date: 2013-01-15 23:03:50
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackers
On 16/01/13 11:14, Bruce Momjian wrote:
> I mentioned last year that I wanted to start working on parallelism:
> Years ago I added thread-safety to libpq.  Recently I added two parallel
> execution paths to pg_upgrade.  The first parallel path allows execution
> of external binaries pg_dump and psql (to restore).  The second parallel
> path does copy/link by calling fork/thread-safe C functions.  I was able
> to do each in 2-3 days.
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes.  I think it is time we
> start to consider additional options.
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries.  The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.
> I have summarized my ideas by updating our Parallel Query Execution wiki
> page:
> Please consider updating the page yourself or posting your ideas to this
> thread.  Thanks.

How about being aware of multiple spindles - so if the requested data 
covers multiple spindles, then data could be extracted in parallel. This 
may, or may not, involve multiple I/O channels?

On large multiple processor machines, there are different blocks of 
memory that might be accessed at different speeds depending on the 
processor. Possibly a mechanism could be used to split a transaction 
over multiple processors to ensure the fastest memory is used?

Once a selection of rows has been made, then if there is a lot of 
reformatting going on, then could this be done in parallel?  I can of 
think of 2 very simplistic strategies: (A) use a different processor 
core for each column, or (B) farm out sets of rows to different cores.  
I am sure in reality, there are more subtleties and aspects of both the 
strategies will be used in a hybrid fashion along with other approaches.

I expect that before any parallel algorithm is invoked, then some sort 
of threshold needs to be exceeded to make it worth while. Different 
aspects of the parallel algorithm may have their own thresholds. It may 
not be worth applying a parallel algorithm for 10 rows from a simple 
table, but selecting 10,000 records from multiple tables each over 10 
million rows using joins may benefit for more extreme parallelism.

I expect that UNIONs, as well as the processing of partitioned tables, 
may be amenable to parallel processing.


In response to


pgsql-hackers by date

Next:From: Stephen FrostDate: 2013-01-15 23:07:01
Previous:From: Bruce MomjianDate: 2013-01-15 23:03:33
Subject: Re: Parallel query execution

Privacy Policy | About PostgreSQL
Copyright © 1996-2018 The PostgreSQL Global Development Group