Re: [DESIGN] ParallelAppend

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Subject: Re: [DESIGN] ParallelAppend
Date: 2015-08-25 03:53:52
Message-ID: CAA4eK1LNt6wQBCxKsMj_QC+GahBuwyKWsQn6UL3nWVQ2savzwg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Aug 25, 2015 at 6:19 AM, Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com> wrote:
>
> > On Fri, Aug 21, 2015 at 7:40 PM, Robert Haas <robertmhaas(at)gmail(dot)com>
wrote:
> >
> > It could be possible, but let me summarize what I thought would be
required
> > for above use case. For Parallel Append, we need to push multiple
> > planned statements in contrast to one planned statement as is done for
> > current patch and then one or more parallel workers needs to work on
each
> > planned statement. So if we know in advance how many planned statements
> > are we passing down (which we should), then using ParallelWorkerNumber
> > (ParallelWorkerNumber % num_planned_statements or some other similar
> > way), workers can find the the planned statement on which they need to
work
> > and similarly information for PartialSeqScan (which currently is
parallel heap
> > scan descriptor information).
> >
> My problem is that we have no identifier to point a particular element on
> the TOC segment even if PARALLEL_KEY_PLANNEDSTMT or PARALLEL_KEY_SCAN can
> have multiple items.
> Please assume a situation when ExecPartialSeqScan() has to lookup
> a particular item on TOC but multiple PartialSeqScan nodes can exist.
>
> Currently, it does:
> pscan = shm_toc_lookup(node->ss.ps.toc, PARALLEL_KEY_SCAN);
>
> However, ExecPartialSeqScan() cannot know which is the index of mine,
> or it is not reasonable to pay attention on other node in this level.
> Even if PARALLEL_KEY_SCAN has multiple items, PartialSeqScan node also
> needs to have identifier.
>

Yes that's right and I think we can find out the same. Basically we need to
know the planned statement number on which current worker is working and
that anyway we have to do before the worker can start the work. One way is
as I have explained above that use ParallelWorkerNumber
(ParallelWorkerNumber % num_planned_statements) to find or might need
some sophisticated way to find that out, but definitely we need to know that
before start of execution by worker and once we know that we can use it
find the PARALLEL_KEY_SCAN or whatever key for this worker (as the
the position of PARALLEL_KEY_SCAN will be same as of planned stmt
for a worker).

> > > I think KaiGai's correct,
> > > and I pointed out the same problem to you before. The parallel key
> > > for the Partial Seq Scan needs to be allocated on the fly and carried
> > > in the node, or we'll never be able to push multiple things below the
> > > funnel.
> >
> > Okay, immediately I don't see what is the best way to achieve this but
> > let us discuss this separately on Parallel Seq Scan thread and let me
> > know if you have something specific in your mind. I will also give this
> > a more thought.
> >
> I want to have 'node_id' in the Plan node, then unique identifier is
> assigned on the field prior to serialization. It is a property of the
> Plan node, so we can reproduce this identifier on the background worker
> side using stringToNode(), then ExecPartialSeqScan can pull out a proper
> field from the TOC segment by this node_id.
>

Okay, this can also work, but why to introduce identifier in plan node, if
it
can work without it.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2015-08-25 05:25:53 Re: Performance improvement for joins where outer side is unique
Previous Message Tom Lane 2015-08-25 02:41:43 Re: pg_controldata output alignment regression