Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2014-12-06 07:06:17
Message-ID: CAA4eK1JfOcyEpdg_-Q+x9hVhVrsj85F74NNq_ns9hSOYN9eWLA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Dec 6, 2014 at 10:43 AM, David Rowley <dgrowleyml(at)gmail(dot)com> wrote:
> On 4 December 2014 at 19:35, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com> wrote:
>>
>> Attached patch is just to facilitate the discussion about the
>> parallel seq scan and may be some other dependent tasks like
>> sharing of various states like combocid, snapshot with parallel
>> workers. It is by no means ready to do any complex test, ofcourse
>> I will work towards making it more robust both in terms of adding
>> more stuff and doing performance optimizations.
>>
>> Thoughts/Suggestions?
>>
>
> This is good news!

Thanks.

> I've not gotten to look at the patch yet, but I thought you may be able
to make use of the attached at some point.
>

I also think so, that it can be used in near future to enhance
and provide more value to the parallel scan feature. Thanks
for taking the initiative to do the leg-work for supporting
aggregates.

> It's bare-bones core support for allowing aggregate states to be merged
together with another aggregate state. I would imagine that if a query such
as:
>
> SELECT MAX(value) FROM bigtable;
>
> was run, then a series of parallel workers could go off and each find the
max value from their portion of the table and then perhaps some other node
type would then take all the intermediate results from the workers, once
they're finished, and join all of the aggregate states into one and return
that. Naturally, you'd need to check that all aggregates used in the
targetlist had a merge function first.
>

Direction sounds to be right.

> This is just a few hours of work. I've not really tested the pg_dump
support or anything yet. I've also not added any new functions to allow
AVG() or COUNT() to work, I've really just re-used existing functions where
I could, as things like MAX() and BOOL_OR() can just make use of the
existing transition function. I thought that this might be enough for early
tests.
>
> I'd imagine such a workload, ignoring IO overhead, should scale pretty
much linearly with the number of worker processes. Of course, if there was
a GROUP BY clause then the merger code would have to perform more work.
>

Agreed.

> If you think you might be able to make use of this, then I'm willing to
go off and write all the other merge functions required for the other
aggregates.
>

Don't you think that first we should stabilize the basic (target list
and quals that can be independently evaluated by workers) parallel
scan and then jump to do such enhancements?

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2014-12-06 07:59:46 Re: On partitioning
Previous Message Amit Kapila 2014-12-06 06:50:15 Re: Parallel Seq Scan