Re: Parallel Seq Scan

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Amit Langote <amitlangote09(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Fabrízio Mello <fabriziomello(at)gmail(dot)com>, Thom Brown <thom(at)linux(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel Seq Scan
Date: 2015-01-22 05:14:37
Message-ID: 54C0873D.3070001@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 21-01-2015 PM 09:43, Amit Kapila wrote:
> On Wed, Jan 21, 2015 at 4:31 PM, Amit Langote <amitlangote09(at)gmail(dot)com>
> wrote:
>> On Wednesday, January 21, 2015, Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
> wrote:
>>>
>>>
>>> Does it happen only when parallel_seqscan_degree > max_worker_processes?
>>
>>
>> I have max_worker_processes set to the default of 8 while
> parallel_seqscan_degree is 4. So, this may be a case different from Thom's.
>>
>
> I think this is due to reason that memory for forming
> tuple in master backend is retained for longer time which
> is causing this statement to take much longer time than
> required. I have fixed the other issue as well reported by
> you in attached patch.
>

Thanks for fixing.

> I think this patch is still not completely ready for general
> purpose testing, however it could be helpful if we can run
> some tests to see in what kind of scenario's it gives benefit
> like in the test you are doing if rather than increasing
> seq_page_cost, you should add an expensive WHERE condition
> so that it should automatically select parallel plan. I think it is better
> to change one of the new parameter's (parallel_setup_cost,
> parallel_startup_cost and cpu_tuple_comm_cost) if you want
> your statement to use parallel plan, like in your example if
> you would have reduced cpu_tuple_comm_cost, it would have
> selected parallel plan, that way we can get some feedback about
> what should be the appropriate default values for the newly added
> parameters. I am already planing to do some tests in that regard,
> however if I get some feedback from other's that would be helpful.
>
>

Perhaps you are aware or you've postponed working on it, but I see that
a plan executing in a worker does not know about instrumentation. It
results in the EXPLAIN ANALYZE showing incorrect figures. For example
compare the normal seqscan and parallel seqscan below:

postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) < 3456 AND
md5(a::text) LIKE 'ac%';
QUERY PLAN

---------------------------------------------------------------------------------------------------------------
Seq Scan on test (cost=0.00..310228.52 rows=16120 width=4) (actual
time=0.497..17062.436 rows=39028 loops=1)
Filter: ((sqrt((a)::double precision) < 3456::double precision) AND
(md5((a)::text) ~~ 'ac%'::text))
Rows Removed by Filter: 9960972
Planning time: 0.206 ms
Execution time: 17378.413 ms
(5 rows)

postgres=# EXPLAIN ANALYZE SELECT * FROM test WHERE sqrt(a) < 3456 AND
md5(a::text) LIKE 'ac%';
QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------
Parallel Seq Scan on test (cost=0.00..255486.08 rows=16120 width=4)
(actual time=7.329..4906.981 rows=39028 loops=1)
Filter: ((sqrt((a)::double precision) < 3456::double precision) AND
(md5((a)::text) ~~ 'ac%'::text))
Rows Removed by Filter: 1992710
Number of Workers: 4
Number of Blocks Per Worker: 8849
Planning time: 0.137 ms
Execution time: 6077.782 ms
(7 rows)

Note the "Rows Removed by Filter". I guess the difference may be
because, all the rows filtered by workers were not accounted for. I'm
not quite sure, but since exec_worker_stmt goes the Portal way,
QueryDesc.instrument_options remains unset and hence no instrumentation
opportunities in a worker backend. One option may be to pass
instrument_options down to worker_stmt?

By the way, 17s and 6s compare really well in favor of parallel seqscan
above, :)

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2015-01-22 05:26:50 Re: Parallel Seq Scan
Previous Message Amit Kapila 2015-01-22 05:00:50 Re: Parallel Seq Scan