Re: Parallel Seq Scan

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Amit Langote <amitlangote09(at)gmail(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-22 05:30:02
Message-ID: CAA4eK1Jf8Bxt2BHL-o-xJqK0RJn75_yrs9NoEKALRYYMaJ9Tng@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jan 22, 2015 at 10:44 AM, Amit Langote <
Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp> wrote:
>
> On 21-01-2015 PM 09:43, Amit Kapila wrote:
> > On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote <amitlangote09(at)gmail(dot)com>
> > wrote:
> >> On Wednesday, January 21, 2015, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> > wrote:
> >>>
> >>>
> >>> Does it happen only when parallel_seqscan_degree >
max_worker_processes?
> >>
> >>
> >> I have max_worker_processes set to the default of 8 while
> > parallel_seqscan_degree is 4. So, this may be a case different from
Thom's.
> >>
> >
> > I think this is due to reason that memory for forming
> > tuple in master backend is retained for longer time which
> > is causing this statement to take much longer time than
> > required. I have fixed the other issue as well reported by
> > you in attached patch.
> >
>
> Thanks for fixing.
>
> > I think this patch is still not completely ready for general
> > purpose testing, however it could be helpful if we can run
> > some tests to see in what kind of scenario's it gives benefit
> > like in the test you are doing if rather than increasing
> > seq_page_cost, you should add an expensive WHERE condition
> > so that it should automatically select parallel plan. I think it is
better
> > to change one of the new parameter's (parallel_setup_cost,
> > parallel_startup_cost and cpu_tuple_comm_cost) if you want
> > your statement to use parallel plan, like in your example if
> > you would have reduced cpu_tuple_comm_cost, it would have
> > selected parallel plan, that way we can get some feedback about
> > what should be the appropriate default values for the newly added
> > parameters. I am already planing to do some tests in that regard,
> > however if I get some feedback from other's that would be helpful.
> >
> >
>
> Perhaps you are aware or you've postponed working on it, but I see that
> a plan executing in a worker does not know about instrumentation.

I have deferred it until other main parts are stabilised/reviewed. Once
that is done, we can take a call what is best we can do for instrumentation.
Thom has reported the same as well upthread.

> Note the "Rows Removed by Filter". I guess the difference may be
> because, all the rows filtered by workers were not accounted for. I'm
> not quite sure, but since exec_worker_stmt goes the Portal way,
> QueryDesc.instrument_options remains unset and hence no instrumentation
> opportunities in a worker backend. One option may be to pass
> instrument_options down to worker_stmt?
>

I think there is more to it, master backend need to process that information
as well.

> By the way, 17s and 6s compare really well in favor of parallel seqscan
> above, :)
>

That sounds interesting.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-01-22 07:32:48 Re: Better way of dealing with pgstat wait timeout during buildfarm runs?
Previous Message Amit Langote 2015-01-22 05:26:50 Re: Parallel Seq Scan