Re: using custom scan nodes to prototype parallel sequential scan

From: Kouhei Kaigai <kaigai(at)ak(dot)jp(dot)nec(dot)com>
To: Haribabu Kommi <kommi(dot)haribabu(at)gmail(dot)com>, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Andres Freund <andres(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: using custom scan nodes to prototype parallel sequential scan
Date: 2014-11-11 07:51:06
Message-ID: 9A28C8860F777E439AA12E8AEA7694F801075013@BPXM15GP.gisp.nec.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> -----Original Message-----
> From: pgsql-hackers-owner(at)postgresql(dot)org
> [mailto:pgsql-hackers-owner(at)postgresql(dot)org] On Behalf Of Haribabu Kommi
> Sent: Tuesday, November 11, 2014 1:13 PM
> To: Amit Kapila
> Cc: Andres Freund; Robert Haas; Simon Riggs; Tom Lane;
> pgsql-hackers(at)postgresql(dot)org
> Subject: Re: [HACKERS] using custom scan nodes to prototype parallel
> sequential scan
>
> On Tue, Nov 11, 2014 at 2:35 PM, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
> > On Tue, Nov 11, 2014 at 5:30 AM, Haribabu Kommi
> > <kommi(dot)haribabu(at)gmail(dot)com>
> > wrote:
> >>
> >> On Tue, Nov 11, 2014 at 10:21 AM, Andres Freund
> >> <andres(at)2ndquadrant(dot)com>
> >> wrote:
> >> > On 2014-11-10 10:57:16 -0500, Robert Haas wrote:
> >> >> Does parallelism help at all?
> >> >
> >> > I'm pretty damn sure. We can't even make a mildly powerfull storage
> >> > fully busy right now. Heck, I can't make my workstation's storage
> >> > with a raid 10 out of four spinning disks fully busy.
> >> >
> >> > I think some of that benefit also could be reaped by being better
> >> > at hinting the OS...
> >>
> >> Yes, it definitely helps but not only limited to IO bound operations.
> >> It gives a good gain for the queries having CPU intensive where
> >> conditions.
> >>
> >> One more point we may need to consider, is there any overhead in
> >> passing the data row from workers to backend?
> >
> > I am not sure if that overhead will be too much visible if we improve
> > the use of I/O subsystem by making parallel tasks working on it.
>
> I feel there may be an overhead because of workers needs to put the result
> data in the shared memory and the backend has to read from there to process
> it further. If the cost of transfering data from worker to backend is more
> than fetching a tuple from the scan, then the overhead is visible when the
> selectivity is more.
>
In my experience, data copy and transformation to fit TupleTableSlot is the
biggest overhead, rather than scan or join itself...
Probably, a straight-forward way is to construct an array of values/isnull
on a shared memory segment, then the backend process just switch pointers of
tts_values/tts_isnull, with no data copy. It gave us a performance gain.

Thanks,
--
NEC OSS Promotion Center / PG-Strom Project
KaiGai Kohei <kaigai(at)ak(dot)jp(dot)nec(dot)com>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2014-11-11 08:01:58 Re: Proposal: Log inability to lock pages during vacuum
Previous Message Michael Paquier 2014-11-11 07:39:14 Re: WAL format and API changes (9.5)