Re: using custom scan nodes to prototype parallel sequential scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Date: 2014-11-11 04:54:10
Message-ID: CAA4eK1KGuNut4H5K3_j52xbzJs+YqbKNYUQ0PtftBN1MH8Nd1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Nov 11, 2014 at 9:42 AM, Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>
wrote:
>
> On Tue, Nov 11, 2014 at 2:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
wrote:
> > On Tue, Nov 11, 2014 at 5:30 AM, Haribabu Kommi <
kommi(dot)haribabu(at)gmail(dot)com>
> > wrote:
> >>
> >> On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund <andres(at)2ndquadrant(dot)com
>
> >> wrote:
> >> > On 2014-11-10 10:57:16 -0500, Robert Haas wrote:
> >> >> Does parallelism help at all?
> >> >
> >> > I'm pretty damn sure. We can't even make a mildly powerfull storage
> >> > fully busy right now. Heck, I can't make my workstation's storage
with a
> >> > raid 10 out of four spinning disks fully busy.
> >> >
> >> > I think some of that benefit also could be reaped by being better at
> >> > hinting the OS...
> >>
> >> Yes, it definitely helps but not only limited to IO bound operations.
> >> It gives a good gain for the queries having CPU intensive where
> >> conditions.
> >>
> >> One more point we may need to consider, is there any overhead in
passing
> >> the data row from workers to backend?
> >
> > I am not sure if that overhead will be too much visible if we improve
the
> > use of I/O subsystem by making parallel tasks working on it.
>
> I feel there may be an overhead because of workers needs to put the result
> data in the shared memory and the backend has to read from there to
process
> it further. If the cost of transfering data from worker to backend is
more than
> fetching a tuple from the scan, then the overhead is visible when the
> selectivity is more.
>
> > However
> > another idea here could be that instead of passing tuple data, we just
> > pass tuple id, but in that case we have to retain the pin on the buffer
> > that contains tuple untill master backend reads from it that might have
> > it's own kind of problems.
>
> Transfering tuple id doesn't solve the scenarios if the node needs any
> projection.

Hmm, that's why I told that we need to retain buffer pin, so that we can
get the tuple data.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-11-11 05:33:23 Re: [v9.5] Custom Plan API
Previous Message Michael Paquier 2014-11-11 04:52:22 Re: REINDEX CONCURRENTLY 2.0