Quick Links

Re: Parallel Seq Scan

From:	John Gorman <johngorman2(at)gmail(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Stephen Frost <sfrost(at)snowman(dot)net>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Parallel Seq Scan
Date:	2015-01-13 11:25:10
Message-ID:	CALkS6B_HBPPzSWuUQsS_S=OD-WtkRc9j2C+LubgDqJ05gigrug@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Sun, Jan 11, 2015 at 6:00 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:

> On Sun, Jan 11, 2015 at 6:01 AM, Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> > So, for my 2c, I've long expected us to parallelize at the relation-file
> > level for these kinds of operations. This goes back to my other
> > thoughts on how we should be thinking about parallelizing inbound data
> > for bulk data loads but it seems appropriate to consider it here also.
> > One of the issues there is that 1G still feels like an awful lot for a
> > minimum work size for each worker and it would mean we don't parallelize
> > for relations less than that size.
>
> Yes, I think that's a killer objection.

One approach that I has worked well for me is to break big jobs into much
smaller bite size tasks. Each task is small enough to complete quickly.

We add the tasks to a task queue and spawn a generic worker pool which eats
through the task queue items.

This solves a lot of problems.

- Small to medium jobs can be parallelized efficiently.
- No need to split big jobs perfectly.
- We don't get into a situation where we are waiting around for a worker to
finish chugging through a huge task while the other workers sit idle.
- Worker memory footprint is tiny so we can afford many of them.
- Worker pool management is a well known problem.
- Worker spawn time disappears as a cost factor.
- The worker pool becomes a shared resource that can be managed and
reported on and becomes considerably more predictable.

In response to

Re: Parallel Seq Scan at 2015-01-11 22:00:13 from Robert Haas

Responses

Re: Parallel Seq Scan at 2015-01-13 12:08:41 from John Gorman
Re: Parallel Seq Scan at 2015-01-14 03:42:57 from Amit Kapila
Re: Parallel Seq Scan at 2015-01-14 21:25:52 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2015-01-13 11:46:46	Re: Async execution of postgres_fdw.
Previous Message	Marco Nenciarini	2015-01-13 11:22:24	Re: [RFC] LSN Map