Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Thom Brown <thom(at)linux(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-11-11 19:26:56
Message-ID: CA+TgmoYksKPHi1dMfa0HcUgLNRoqCMF_goGiPK1MxgJakfq5Lg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 11, 2015 at 12:59 PM, Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com> wrote:
> I have a first query
>
> I looked on EXPLAIN ANALYZE output and the numbers of filtered rows are
> differen

Hmm, I see I was right about people finding more bugs once this was
committed. That didn't take long.

There's supposed to be code to handle this - see the
SharedPlanStateInstrumentation stuff in execParallel.c - but it's
evidently a few bricks shy of a load.
ExecParallelReportInstrumentation is supposed to transfer the counts
from each worker to the DSM:

ps_instrument = &instrumentation->ps_instrument[i];
SpinLockAcquire(&ps_instrument->mutex);
InstrAggNode(&ps_instrument->instr, planstate->instrument);
SpinLockRelease(&ps_instrument->mutex);

And ExecParallelRetrieveInstrumentation is supposed to slurp those
counts back into the leader's PlanState objects:

/* No need to acquire the spinlock here; workers have exited already. */
ps_instrument = &instrumentation->ps_instrument[i];
InstrAggNode(planstate->instrument, &ps_instrument->instr);

This might be a race condition, or it might be just wrong logic.
Could you test what happens if you insert something like a 1-second
sleep in ExecParallelFinish just after the call to
WaitForParallelWorkersToFinish()? If that makes the results
consistent, this is a race. If it doesn't, something else is wrong:
then it would be useful to know whether the workers are actually
calling ExecParallelReportInstrumentation, and whether the leader is
actually calling ExecParallelRetrieveInstrumentation, and if so
whether they are doing it for the correct set of nodes.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavel Stehule 2015-11-11 19:46:11 Re: Parallel Seq Scan
Previous Message Tom Lane 2015-11-11 19:24:33 Re: Patch to install config/missing