Re: Parallel Seq Scan

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-02-07 21:30:29
Message-ID: 20150207213029.GG9201@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-02-06 22:57:43 -0500, Robert Haas wrote:
> On Fri, Feb 6, 2015 at 2:13 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> > My first comment here is that I think we should actually teach
> > heapam.c about parallelism.
>
> I coded this up; see attached. I'm also attaching an updated version
> of the parallel count code revised to use this API. It's now called
> "parallel_count" rather than "parallel_dummy" and I removed some
> stupid stuff from it. I'm curious to see what other people think, but
> this seems much cleaner to me. With the old approach, the
> parallel-count code was duplicating some of the guts of heapam.c and
> dropping the rest on the floor; now it just asks for a parallel scan
> and away it goes. Similarly, if your parallel-seqscan patch wanted to
> scan block-by-block rather than splitting the relation into equal
> parts, or if it wanted to participate in the synchronized-seqcan
> stuff, there was no clean way to do that. With this approach, those
> decisions are - as they quite properly should be - isolated within
> heapam.c, rather than creeping into the executor.

I'm not convinced that that reasoning is generally valid. While it may
work out nicely for seqscans - which might be useful enough on its own -
the more stuff we parallelize the *more* the executor will have to know
about it to make it sane. To actually scale nicely e.g. a parallel sort
will have to execute the nodes below it on each backend, instead of
doing that in one as a separate step, ferrying over all tuples to
indivdual backends through queues, and only then parallezing the
sort.

Now. None of that is likely to matter immediately, but I think starting
to build the infrastructure at the points where we'll later need it does
make some sense.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2015-02-07 21:32:58 Re: Cube extension kNN support
Previous Message Robert Haas 2015-02-07 21:07:45 perplexing error message