Re: Parallel sec scan in plpgsql

From: Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru>
To: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
Cc: Ashutosh Bapat <ashutosh(dot)bapat(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Parallel sec scan in plpgsql
Date: 2016-09-20 11:15:45
Message-ID: 6d1ff632-f0b5-079f-24c6-c4981b65ae8e@postgrespro.ru
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 18.09.2016 06:54, Amit Kapila wrote:
> On Fri, Sep 16, 2016 at 8:48 PM, Alex Ignatov <a(dot)ignatov(at)postgrespro(dot)ru> wrote:
>>
>> On 16.09.2016 16:50, Amit Kapila wrote:
>>>
>>>
>>> Can you try by setting force_parallel_mode = off;? I think it is
>>> sending the whole function execution to worker due to
>>> force_parallel_mode.
>>>
>>>
>>
>> No changes:
>>
>
> Okay, it just skipped from my mind that we don't support parallel
> queries for SQL statement execution (or statements executed via
> exec_stmt_execsql) from plpgsql. For detailed explanation of why that
> is not feasible you can refer one of my earlier e-mails [1] on similar
> topic. I think if we can somehow get the results via Perform
> statement, then it could be possible to use parallelism via plpgsql.
>
> However, you can use it via SQL functions, an example is below:
>
> set min_parallel_relation_size =0;
> set parallel_tuple_cost=0;
> set parallel_setup_cost=0;
>
> Load 'auto_explain';
> set auto_explain.log_min_duration = 0;
> set auto_explain.log_analyze = true;
> set auto_explain.log_nested_statements = true;
>
> create table test_plpgsql(c1 int, c2 char(1000));
> insert into test_plpgsql values(generate_series(1,100000),'aaa');
>
> create or replace function parallel_test_set_sql() returns
> setof bigint as $$
> select count(*) from test_plpgsql;
> $$language sql PARALLEL SAFE STRICT STABLE;
>
> Then execute function as: select * from parallel_test_set_sql(); You
> can see below plan if auto_explain module is loaded.
>
> Finalize Aggregate (cost=14806.85..14806.86 rows=1 width=8) (actual tim
> e=1094.966..1094.967 rows=1 loops=1)
> -> Gather (cost=14806.83..14806.84 rows=2 width=8) (actual time=472.
> 216..1094.943 rows=3 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> -> Partial Aggregate (cost=14806.83..14806.84 rows=1 width=8)
> (actual time=177.867..177.868 rows=1 loops=3)
> -> Parallel Seq Scan on test_plpgsql (cost=0.00..14702.6
> 7 rows=41667 width=0) (actual time=0.384..142.565 rows=33333 loops=3)
> CONTEXT: SQL function "parallel_test_set_sql" statement 1
> LOG: duration: 2965.040 ms plan:
> Query Text: select * from parallel_test_set_sql();
> Function Scan on parallel_test_set_sql (cost=0.25..10.25 rows=1000 widt
> h=8) (actual time=2538.620..2776.955 rows=1 loops=1)
>
>
> In general, I think we should support the cases as required (or
> written) by you from plpgsql or sql functions. We need more work to
> support such cases. There are probably two ways of supporting such
> cases, we can build some intelligence in plpgsql execution such that
> it can recognise such queries and allow to use parallelism or we need
> to think of enabling parallelism for cases where we don't run the plan
> to completion. Most of the use cases from plpgsql or sql function
> fall into later category as they don't generally run the plan to
> completion.
>
>
> [1] - https://www.postgresql.org/message-id/CAA4eK1K8kaO_jRk42-o2rmhSRbKV-3mR%2BiNVcONLdbcSXW5TfQ%40mail.gmail.com
>

Thank you for you sugestion! That works.

But what we can do with this function:
create or replace function parallel_test_sql(t int) returns setof bigint as
$$
select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;
$$ language sql PARALLEL SAFE STRICT STABLE;

explain (analyze,buffers) select * from parallel_test_sql(2);

"Function Scan on parallel_test_sql (cost=0.25..10.25 rows=1000 width=8) (actual time=2410.789..2410.790 rows=1 loops=1)"
" Buffers: shared hit=63696"
"Planning time: 0.082 ms"
"Execution time: 2410.841 ms"

2016-09-20 14:09:04 MSK [13037]: [75-1] user=ipdr,db=ipdr,app=pgAdmin III - Query Tool,client=127.0.0.1 LOG: duration: 2410.135 ms plan:
Query Text:
select count(*) from (select a,b,c,d,e,sum(bytes) from test where a>= $1 group by a,b,c,d,e)t;

Aggregate (cost=230701.42..230701.43 rows=1 width=8)
-> HashAggregate (cost=230363.59..230513.74 rows=15015 width=28)
Group Key: test.a, test.b, test.c, test.d, test.e
-> Seq Scan on test (cost=0.00..188696.44 rows=3333372 width=20)
Filter: (a >= $1)

No parallelism again. Looks like that Filter: (a >= $1) breaks parallelism

Alex Ignatov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Matteo Beccati 2016-09-20 12:19:38 Re: kqueue
Previous Message valeriof 2016-09-20 11:02:17 Re: Error running custom plugin: “output plugins have to declare the _PG_output_plugin_init symbol”