Re: Parallel Seq Scan

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-10-13 21:59:56
Message-ID: CA+TgmoZum+Yr3Q8b1kV4J-0=87ZX3eDgr+vNJq0VLnoG3GT6pw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Oct 13, 2015 at 2:45 AM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
> Attached is rebased patch for partial seqscan support.

Review comments:

- If you're going to pgindent execParallel.c, you need to add some
entries to typedefs.list so it doesn't mangle the formatting.
ExecParallelEstimate's parameter list is misformatted, for example.
Also, I think if we're going to do this we had better extract the
pgindent changes and commit those first. It's pretty distracting the
way you have it.

- Instead of inlining the work needed by each parallel mode in
ExecParallelEstimate(), I think you should mimic the style of
ExecProcNode and call a node-type specific function that is part of
that node's public interface - here, ExecPartialSeqScanEstimate,
perhaps. Similarly for ExecParallelInitializeDSM. Perhaps
ExecPartialSeqScanInitializeDSM.

- I continue to think GetParallelShmToc is the wrong approach.
Instead, each time ExecParallelInitializeDSM or
ExecParallelInitializeDSM calls a nodetype-specific initialized
function (as described in the previous point), have it pass d->pcxt as
an argument. The node can get the toc from there if it needs it. I
suppose it could store a pointer to the toc in its scanstate if it
needs it, but it really shouldn't. Instead, it should store a pointer
to, say, the ParallelHeapScanDesc in the scanstate. It really should
only care about its own shared data, so once it finds that, the toc
shouldn't be needed any more. Then ExecPartialSeqScan doesn't need to
look up pscan; it's already recorded in the scanstate.

- ExecParallelInitializeDSMContext's new pscan_len member is 100%
wrong. Individual scan nodes don't get to add stuff to that context
object. They should store details like this in their own ScanState as
needed.

- The positioning of the new nodes in various lists doesn't seem to
entirely consistent. nodes.h adds them after SampleScan which isn't
terrible, though maybe immediately after SeqScan would be better, but
_outNode has it right after BitmapOr and the switch in _copyObject has
it somewhere else again.

- Although the changes in parallelpaths.c are in a good direction, I'm
pretty sure this is not yet up to scratch. I am less sure exactly
what needs to be fixed, so I'll have to give some more thought to
that.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amir Rohan 2015-10-13 22:03:16 Re: Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files
Previous Message David Rowley 2015-10-13 21:28:24 Re: Performance improvement for joins where outer side is unique