Re: [HACKERS] [postgresql 10 beta3] unrecognized node type: 90

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, "Adam, Etienne (Nokia-TECH/Issy Les Moulineaux)" <etienne(dot)adam(at)nokia(dot)com>, PostgreSQL Bugs <pgsql-bugs(at)postgresql(dot)org>, "Duquesne, Pierre (Nokia-TECH/Issy Les Moulineaux)" <pierre(dot)duquesne(at)nokia(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] [postgresql 10 beta3] unrecognized node type: 90
Date: 2017-08-18 10:52:04
Message-ID: CAA4eK1LEa_xu9wMLT=-emDBi8bqLBQhUbnNOp4vj84f854XnOA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Fri, Aug 18, 2017 at 3:50 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> I wrote:
>> Ah-hah, I see my dromedary box is one of the ones failing, so I'll
>> have a look there. I can't reproduce it on my other machines.
>
> OK, so this is a whole lot more broken than I thought :-(.
>
> Bear in mind that the plan for this (omitting uninteresting detail) is
>
> Nested Loop Left Join
> -> Values Scan on "*VALUES*"
> -> Finalize GroupAggregate
> -> Gather Merge
> -> Partial GroupAggregate
> -> Sort
> -> Parallel Seq Scan on tenk1
>
> What seems to be happening is that:
>
> 1. On the first pass, the parallel seqscan work gets doled out to several
> workers, plus the leader, as expected.
>
> 2. When the nestloop rescans its right input, ExecReScanGatherMerge
> supposes that this is good enough to handle rescanning its subnodes:
>
> ExecReScan(node->ps.lefttree);
>
> Leaving aside the question of why that doesn't look like nearly every
> other child rescan call, what happens is that that invokes ExecReScanAgg,
> which does the more usual thing:
>
> if (outerPlan->chgParam == NULL)
> ExecReScan(outerPlan);
>
> and so that invokes ExecReScanSort, and then behold what ExecReScanSort
> thinks it can optimize away:
>
> * If subnode is to be rescanned then we forget previous sort results; we
> * have to re-read the subplan and re-sort. Also must re-sort if the
> * bounded-sort parameters changed or we didn't select randomAccess.
> *
> * Otherwise we can just rewind and rescan the sorted output.
>
> So we never get to ExecReScanSeqScan, and thus not to heap_rescan,
> with the effect that parallel_scan->phs_nallocated never gets reset.
>
> 3. On the next pass, we fire up all the workers as expected, but they all
> perceive phs_nallocated >= rs_nblocks and conclude they have nothing to
> do. Meanwhile, in the leader, nodeSort just re-emits the sorted data it
> had last time. Net effect is that the GatherMerge node returns only the
> fraction of the data that was scanned by the leader in the first pass.
>
> 4. The fact that the test succeeds on many machines implies that the
> leader process is usually doing *all* of the work. This is in itself not
> very good. Even on the machines where it fails, the fact that the tuple
> counts are usually a pretty large fraction of the expected values
> indicates that the leader usually did most of the work. We need to take
> a closer look at why that is.
>
> However, the bottom line here is that parallel scan is completely broken
> for rescans, and it's not (solely) the fault of nodeGatherMerge; rather,
> the issue is that nobody bothered to wire up parallelism to the rescan
> parameterization mechanism.
>

I think we don't generate parallel plans for parameterized paths, so I
am not sure whether any work is required in that area.

> I imagine that related bugs can be
> demonstrated in 9.6 with little effort.
>
> I think that the correct fix probably involves marking each parallel scan
> plan node as dependent on a pseudo executor parameter, which the parent
> Gather or GatherMerge node would flag as being changed on each rescan.
> This would cue the plan layers in between that they cannot optimize on the
> assumption that the leader's instance of the parallel scan will produce
> exactly the same rows as it did last time, even when "nothing else
> changed". The "wtParam" pseudo parameter that's used for communication
> between RecursiveUnion and its descendant WorkTableScan node is a good
> model for what needs to happen.
>

Yeah, that seems like a good idea. I think another way could be to
*not* optimize rescanning when we are in parallel mode
(IsInParallelMode()), that might be restrictive as compared to what
you are suggesting, but will be somewhat simpler.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-bugs by date

  From Date Subject
Next Message Peter Geoghegan 2017-08-18 19:36:16 Re: Crash report for some ICU-52 (debian8) COLLATE and work_mem values
Previous Message Vladimir Svedov 2017-08-18 10:52:00 seems column(alias_name) is a valid syntax? I cant find anything alike in docs

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2017-08-18 11:18:02 Re: why not parallel seq scan for slow functions
Previous Message Michael Banck 2017-08-18 09:28:33 Re: Create replication slot in pg_basebackup if requested and not yet present