Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Gavin Flower <GavinFlower(at)archidevsys(dot)co(dot)nz>, Jeff Davis <pgsql(at)j-davis(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-09-30 23:31:56
Message-ID: CA+TgmoYhHGOh6ktTJWMW6R9Fi8qhZzHBAf+Cfe5TdXWnnZ92cQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Sep 30, 2015 at 11:23 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>> - I don't believe that shm_toc *toc has any business being part of a
>> generic PlanState node. At most, it should be part of an individual
>> type of PlanState, like a GatherState or PartialSeqScanState. But
>> really, I don't see why we need it there at all.
>
> We need it for getting parallelheapscan descriptor in case of
> partial sequence scan node, it doesn't seem like a good idea
> to retrieve it in begining, as we need to dig into plan tree to
> get the node_id for getting the value of parallelheapscan descriptor
> from toc.
>
> Now, I think we can surely keep it in PartialSeqScanState or any
> other node state which might need it later, but I felt this is quite
> generic and we might need to fetch node specific information from toc
> going forward.

It's true that the PartialSeqScanState will need a way to get at the
toc, but I don't think that means we should stash it in the PlanState.
I've taken that part out for now.

>> - I think that a Gather node should inherit from Plan, not Scan. A
>> Gather node really shouldn't have a scanrelid. Now, admittedly, if
>> the only thing under the Gather is a Partial Seq Scan, it wouldn't be
>> totally bonkers to think of the Gather as scanning the same relation
>> that the Partial Seq Scan is scanning. But in any more complex case,
>> like where it's scanning a join, you're out of luck. You'll have to
>> set scanrelid == 0, I suppose, but then, for example, ExecScanReScan
>> is not going to work. In fact, as far as I can see, the only way
>> nodeGather.c is actually using any of the generic scan stuff is by
>> calling ExecInitScanTupleSlot, which is all of one line of code.
>> ExecEndGather fetches node->ss.ss_currentRelation but then does
>> nothing with it. So I think this is just a holdover from early
>> version of this patch where what's now Gather and PartialSeqScan were
>> a single node, and I think we should rip it out.
>
> makes sense and I think GatherState should also be inherit from PlanState
> instead of ScanState which I have changed in patch attached.

You missed a number of things while doing this - I cleaned them up.

>> - Also, I think this stuff about physical tlists in
>> create_gather_plan() is bogus. use_physical_tlist is ignorant of the
>> possibility that the RelOptInfo passed to it might be anything other
>> than a baserel, and I think it won't be happy if it gets a joinrel.
>> Moreover, I think our plan here is that, at least for now, the
>> Gather's tlist will always match the tlist of its child. If that's
>> so, there's no point to this: it will end up with the same tlist
>> either way. If any projection is needed, it should be done by the
>> Gather node's child, not the Gather node itself.
>
> Yes, Gather node itself doesn't need to do projection, but it
> needs the projection info to store the same in Slot after fetching
> the tuple from tuple queue. Now this is not required for Gather
> node itself, but it might be required for any node on top of
> Gather node.
>
> Here, I think one thing we could do is that use the subplan's target
> list as currently is being done for quals. The only risk is what if
> Gating node is added on top of partialseqscan (subplan), but I have checked
> that is safe, because Gating plan uses the same target list as it's child.
> Also I don't think we need to process any quals at Gather node, so I will
> make that as Null, I will do this change in next version unless you see
> any problem with it.
>
> Yet another idea is during set_plan_refs(), we can assign leftchild's
> target list to parent in case of Gather node (right now it's done in
> reverse way which needs to be changed.)
>
> What is your preference?

I made it work like other nodes that inherit their left child's target list.

I made a few other changes as well:

- I wrote documentation for the GUCs. This probably needs to be
expanded once we get the whole feature in, but it's something.

- I added a new single_copy option to the gather. A single-copy
gather never tries to execute the plan itself, unless it can't get any
workers. This is very handy for testing, since it lets you stick a
Gather node on top of an arbitrary plan and, if everything's working,
it should work just as if the Gather node weren't there. I did a bit
of minor fiddling with the contents of the GatherState to make this
work. It's also useful in real life, since somebody can stick a
single-copy Gather node into a plan someplace and run everything below
that in a worker.

- I fixed a bug in ExecGather - you were testing whether
node->pei->pcxt is NULL, which seg faults on the first time through.
The correct thing is to node->pei.

- Assorted cosmetic changes.

- I again left out the early-executor-stop stuff, preferring to leave
that for a separate commit.

That done, I have committed this.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2015-09-30 23:36:51 Re: PATCH: index-only scans with partial indexes
Previous Message Robert Haas 2015-09-30 23:29:30 pgsql: Add a Gather executor node.