On Wed, Jan 16, 2013 at 12:03:50PM +1300, Gavin Flower wrote:
> On 16/01/13 11:14, Bruce Momjian wrote:
> I mentioned last year that I wanted to start working on parallelism:
> Years ago I added thread-safety to libpq. Recently I added two parallel
> execution paths to pg_upgrade. The first parallel path allows execution
> of external binaries pg_dump and psql (to restore). The second parallel
> path does copy/link by calling fork/thread-safe C functions. I was able
> to do each in 2-3 days.
> I believe it is time to start adding parallel execution to the backend.
> We already have some parallelism in the backend:
> effective_io_concurrency and helper processes. I think it is time we
> start to consider additional options.
> Parallelism isn't going to help all queries, in fact it might be just a
> small subset, but it will be the larger queries. The pg_upgrade
> parallelism only helps clusters with multiple databases or tablespaces,
> but the improvements are significant.
> I have summarized my ideas by updating our Parallel Query Execution wiki
> Please consider updating the page yourself or posting your ideas to this
> thread. Thanks.
> How about being aware of multiple spindles - so if the requested data covers
> multiple spindles, then data could be extracted in parallel. This may, or may
> not, involve multiple I/O channels?
Well, we usually label these as tablespaces. I don't know if
spindle-level is a reasonable level to add.
> On large multiple processor machines, there are different blocks of memory that
> might be accessed at different speeds depending on the processor. Possibly a
> mechanism could be used to split a transaction over multiple processors to
> ensure the fastest memory is used?
That seems too far-out for an initial approach.
> Once a selection of rows has been made, then if there is a lot of reformatting
> going on, then could this be done in parallel? I can of think of 2 very
> simplistic strategies: (A) use a different processor core for each column, or
> (B) farm out sets of rows to different cores. I am sure in reality, there are
> more subtleties and aspects of both the strategies will be used in a hybrid
> fashion along with other approaches.
Probably #2, but that is going to require having some of modules
thread/fork-safe, and that is going to be tricky.
> I expect that before any parallel algorithm is invoked, then some sort of
> threshold needs to be exceeded to make it worth while. Different aspects of
> the parallel algorithm may have their own thresholds. It may not be worth
> applying a parallel algorithm for 10 rows from a simple table, but selecting
> 10,000 records from multiple tables each over 10 million rows using joins may
> benefit for more extreme parallelism.
Right, I bet we will need some way to control when the overhead of
parallel execution is worth it.
> I expect that UNIONs, as well as the processing of partitioned tables, may be
> amenable to parallel processing.
Interesting idea on UNION.
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
+ It's impossible for everything to be true. +
In response to
pgsql-hackers by date
|Next:||From: Stephen Frost||Date: 2013-01-15 23:15:57|
|Subject: Re: Parallel query execution|
|Previous:||From: Stephen Frost||Date: 2013-01-15 23:07:01|
|Subject: Re: [PATCH] COPY .. COMPRESSED|