Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: John Gorman <johngorman2(at)gmail(dot)com>
Cc: Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-14 21:25:52
Message-ID: CA+Tgmoaoj8kf6ft9O1E=T3+XCrRoKr4sWBVfoXdzFaDCH+=M+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 13, 2015 at 6:25 AM, John Gorman <johngorman2(at)gmail(dot)com> wrote:
> One approach that I has worked well for me is to break big jobs into much
> smaller bite size tasks. Each task is small enough to complete quickly.
>
> We add the tasks to a task queue and spawn a generic worker pool which eats
> through the task queue items.
>
> This solves a lot of problems.
>
> - Small to medium jobs can be parallelized efficiently.
> - No need to split big jobs perfectly.
> - We don't get into a situation where we are waiting around for a worker to
> finish chugging through a huge task while the other workers sit idle.
> - Worker memory footprint is tiny so we can afford many of them.
> - Worker pool management is a well known problem.
> - Worker spawn time disappears as a cost factor.
> - The worker pool becomes a shared resource that can be managed and reported
> on and becomes considerably more predictable.

I think this is a good idea, but for now I would like to keep our
goals somewhat more modest: let's see if we can get parallel
sequential scan, and only parallel sequential scan, working and
committed. Ultimately, I think we may need something like what you're
talking about, because if you have a query with three or six or twelve
different parallelizable operations in it, you want the available CPU
resources to switch between those as their respective needs may
dictate. You certainly don't want to spawn a separate pool of workers
for each scan.

But I think getting that all working in the first version is probably
harder than what we should attempt. We have a bunch of problems to
solve here just around parallel sequential scan and the parallel mode
infrastructure: heavyweight locking, prefetching, the cost model, and
so on. Trying to add to that all of the problems that might attend on
a generic task queueing infrastructure fills me with no small amount
of fear.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-01-14 21:41:59 Re: Typo fix in alter_table.sgml
Previous Message Robert Haas 2015-01-14 21:02:28 Re: OOM on EXPLAIN with lots of nodes