Re: Parallel Query - Can it be used within functions?

From: Michael Krüger <michael(at)kruegers(dot)email>
To: Andreas Kretschmer <andreas(at)a-kretschmer(dot)de>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Parallel Query - Can it be used within functions?
Date: 2018-02-28 08:08:47
Message-ID: CACSnzzXtX1t9r2pvnaYgGrkcWEXLdRFjKOnmhRVsBadcRASgnw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

ah, and for completeness the simple python function I use for the test:

create or replace function reports.generic_query_python(_sql text)
RETURNS SETOF record
LANGUAGE 'plpythonu'
PARALLEL SAFE
COST 100
VOLATILE
ROWS 5000
AS $BODY$
return plpy.execute( _sql )
$BODY$;

Michael Krüger <michael(at)kruegers(dot)email> schrieb am Mi., 28. Feb. 2018 um
09:05 Uhr:

> Ok, to close this thread. The problem is, that plpgsql function do seem to
> return data using a cursor. That fact is disabling parallel execution. So
> if we instead hand over the SQL to a function with e.g. a python body, then
> parallel execution is happening, because the data is first assembled in
> memory before it is returned, without using a cursor:
>
> mkrueger=# explain analyze select * from reports.generic_query_python($$
> select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
> mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1 $$) as foo (mediatrunkid bigint, count numeric);
> LOG: 00000: duration: 35.158 ms plan:
> Query Text: select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Partial HashAggregate (cost=56693.72..56696.72 rows=300 width=16) (actual
> time=35.144..35.149 rows=17 loops=1)
> Group Key: mediatrunkid
> Buffers: shared hit=1641
> -> Parallel Bitmap Heap Scan on mediatrunkkpi (cost=4525.01..56279.28
> rows=82889 width=8) (actual time=2.350..24.584 rows=63794 loops=1)
> Recheck Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Heap Blocks: exact=1641
> Buffers: shared hit=1641
> -> Bitmap Index Scan on idx_mediatrunkkpi_trunk
> (cost=0.00..4475.27 rows=198933 width=0) (never executed)
> Index Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> LOCATION: explain_ExecutorEnd, auto_explain.c:359
> LOG: 00000: duration: 35.165 ms plan:
> Query Text: select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Partial HashAggregate (cost=56693.72..56696.72 rows=300 width=16) (actual
> time=35.152..35.157 rows=17 loops=1)
> Group Key: mediatrunkid
> Buffers: shared hit=1630
> -> Parallel Bitmap Heap Scan on mediatrunkkpi (cost=4525.01..56279.28
> rows=82889 width=8) (actual time=2.364..24.702 rows=63272 loops=1)
> Recheck Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Heap Blocks: exact=1630
> Buffers: shared hit=1630
> -> Bitmap Index Scan on idx_mediatrunkkpi_trunk
> (cost=0.00..4475.27 rows=198933 width=0) (never executed)
> Index Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> LOCATION: explain_ExecutorEnd, auto_explain.c:359
> LOG: 00000: duration: 47.855 ms plan:
> Query Text: select mediatrunkid,count(*)::numeric from
> reports.mediatrunkkpi where mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1
> Finalize GroupAggregate (cost=57784.41..57792.66 rows=300 width=40)
> (actual time=45.331..45.344 rows=17 loops=1)
> Group Key: mediatrunkid
> Buffers: shared hit=2735
> -> Sort (cost=57784.41..57785.91 rows=600 width=16) (actual
> time=45.322..45.325 rows=51 loops=1)
> Sort Key: mediatrunkid
> Sort Method: quicksort Memory: 27kB
> Buffers: shared hit=2735
> -> Gather (cost=57693.72..57756.72 rows=600 width=16) (actual
> time=45.270..45.295 rows=51 loops=1)
> Workers Planned: 2
> Workers Launched: 2
> Buffers: shared hit=2735
> -> Partial HashAggregate (cost=56693.72..56696.72 rows=300
> width=16) (actual time=38.387..38.391 rows=17 loops=3)
> Group Key: mediatrunkid
> Buffers: shared hit=6006
> -> Parallel Bitmap Heap Scan on mediatrunkkpi
> (cost=4525.01..56279.28 rows=82889 width=8) (actual time=5.564..27.399
> rows=67080 loops=3)
> Recheck Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Heap Blocks: exact=1912
> Buffers: shared hit=6006
> -> Bitmap Index Scan on
> idx_mediatrunkkpi_trunk (cost=0.00..4475.27 rows=198933 width=0) (actual
> time=11.229..11.229 rows=201241 loops=1)
> Index Cond: (mediatrunkid = ANY
> ('{2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028}'::bigint[]))
> Buffers: shared hit=823
> LOCATION: explain_ExecutorEnd, auto_explain.c:359
> LOG: 00000: duration: 49.924 ms plan:
> Query Text: explain analyze select * from reports.generic_query_python($$
> select mediatrunkid,count(*)::numeric from reports.mediatrunkkpi where
> mediatrunkid in
> (2129932,2101261,2125841,409613,409612,2142220,2101260,2129933,2125839,2138124,2117644,2125836,2187276,2154508,2101263,2109452,2134028)
> group by 1 $$) as foo (mediatrunkid bigint, count numeric);
> Function Scan on generic_query_python foo (cost=0.25..50.25 rows=5000
> width=40) (actual time=49.920..49.922 rows=17 loops=1)
> Buffers: shared hit=6388
> LOCATION: explain_ExecutorEnd, auto_explain.c:359
> QUERY PLAN
>
> -------------------------------------------------------------------------------------------------------------------------------
> Function Scan on generic_query_python foo (cost=0.25..50.25 rows=5000
> width=40) (actual time=49.920..49.922 rows=17 loops=1)
> Planning time: 0.029 ms
> Execution time: 49.977 ms
> (3 rows)
>
>
>
>
>
> Michael Krüger <michael(at)kruegers(dot)email> schrieb am Fr., 16. Feb. 2018 um
> 11:42 Uhr:
>
>> Dear all,
>>
>> still same behavior with Postgres 10.2 ...
>>
>> Just as a reminder that the issue still exists.
>>
>> Regards,
>> Michael
>>
>> Andreas Kretschmer <andreas(at)a-kretschmer(dot)de> schrieb am Di., 6. Feb.
>> 2018 um 08:35 Uhr:
>>
>>> Hi,
>>>
>>>
>>> Am 06.02.2018 um 08:24 schrieb Michael Krüger:
>>> > create or replace function reports.generic_query(_sql text)
>>> > RETURNS SETOF record
>>> > LANGUAGE 'plpgsql'
>>> > PARALLEL SAFE
>>> > COST 100
>>>
>>> there is an other parameter, parallel_setup_cost, with default = 1000. I
>>> think, you should set this parameter too.
>>>
>>> Please keep me informed, it is interessting me.
>>>
>>>
>>> Regards, Andreas
>>>
>>> --
>>> 2ndQuadrant - The PostgreSQL Support Company.
>>> www.2ndQuadrant.com
>>>
>>>
>>>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message mariusz 2018-02-28 10:49:06 Re: Given a set of daterange, finding the continuous range that includes a particular date (aggregates)
Previous Message Michael Krüger 2018-02-28 08:05:51 Re: Parallel Query - Can it be used within functions?