Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-02-06 17:34:29
Message-ID: CA+TgmobApqez-4b9wJbh5CqBY+jbUD2Y9_ojArg5bN=wNwmJVw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Feb 6, 2015 at 9:43 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Here is the latest patch which fixes reported issues and supported
> Prepared Statements and Explain Statement for parallel sequential
> scan.
>
> The main purpose is to get the feedback if possible on overall
> structure/design of code before I goahead.

I'm not very happy with the way this is modularized:

1. The new parallel sequential scan node runs only in the master. The
workers are running a regular sequential scan with a hack to make them
scan a subset of the blocks. I think this is wrong; parallel
sequential scan shouldn't require this kind of modifications to the
non-parallel case.

2. InitiateWorkers() is entirely specific to the concerns of parallel
sequential scan. After looking this over, I think there are three
categories of things that need to be clearly separated. Some stuff is
going to be needed for any parallel query; some stuff is going to be
needed only for parallel scans but will be needed for any type of
parallel scan, not just parallel sequential scan[1]; some stuff is
needed for any type of node that returns tuples but not for nodes that
don't return tuples (e.g. needed for ParallelSeqScan and
ParallelHashJoin, but not needed for ParallelHash); and some stuff is
only going to be needed for parallel sequential scan specifically.
This patch mixes all of those concerns together in a single function.
That won't do; this needs to be easily extensible to whatever someone
wants to parallelize next.

3. I think the whole idea of using the portal infrastructure for this
is wrong. We've talked about this before, but the fact that you're
doing it this way is having a major impact on the whole design of the
patch, and I can't believe it's ever going to be committable this way.
To create a portal, you have to pretend that you received a protocol
message, which you didn't; and you have to pretend there is an SQL
query so you can call PortalDefineQuery. That's ugly. As far as I
can see the only thing we really get out of any of that is that we can
use the DestReceiver stuff to get the tuples back to the master, but
that doesn't really work either, because you're having to hack
printtup.c anyway. So from my point of view you're going through a
bunch of layers that really don't have any value. Considering the way
the parallel mode patch has evolved, I no longer think there's much
point to passing anything other than raw tuples between the backends,
so the whole idea of going through a deform/send/recv/form cycle seems
like something we can entirely skip.

4.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] It is of course arguable whether a parallel index-scan or parallel
bitmap index-scan or parallel index-only-scan or parallel custom scan
makes sense, but this patch shouldn't assume that we won't want to do
those things. We have other places in the code that know about the
concept of a scan as opposed to some other kind of executor construct,
and we should preserve that distinction here.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2015-02-06 19:13:33 Re: Parallel Seq Scan
Previous Message Tom Lane 2015-02-06 16:09:32 Re: ExplainModifyTarget doesn't work as expected