From: | Rafia Sabih <rafia(dot)sabih(at)enterprisedb(dot)com> |
---|---|
To: | Robert Haas <robertmhaas(at)gmail(dot)com> |
Cc: | PostgreSQL Developers <pgsql-hackers(at)postgresql(dot)org> |
Subject: | Re: Enabling parallelism for queries coming from SQL or other PL functions |
Date: | 2017-03-21 10:06:22 |
Message-ID: | CAOGQiiMPSx=zBP2h+9jZvEaV3bYqMmKt-phST7wT1WrtZ_6o4A@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Wed, Mar 15, 2017 at 8:55 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Note this:
>
> if (completed || !fcache->returnsSet)
> postquel_end(es);
>
> When the SQL function doesn't return a set, then we can allow
> parallelism even when lazyEval is set, because we'll only call
> ExecutorStart() once. But my impression is that something like this:
Well, when I test following SQL function I see it cannot be
parallelised because lazyEval is true for it though it is not
returning set,
CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
BEGIN
SELECT count(distinct i) FROM t WHERE j = 12;
END;
$$ LANGUAGE sql;
Query Text:
SELECT count(distinct i) FROM t WHERE j = 12;
Aggregate (cost=34.02..34.02 rows=1 width=8) (actual
time=0.523..0.523 rows=1 loops=1)
-> Seq Scan on t (cost=0.00..34.01 rows=1 width=4) (actual
time=0.493..0.493 rows=0 loops=1)
Filter: (j = 12)
Rows Removed by Filter: 2001
2017-03-21 15:24:03.378 IST [117823] CONTEXT: SQL function
"already_parallel" statement 1
2017-03-21 15:24:03.378 IST [117823] LOG: duration: 94868.181 ms plan:
Query Text: select already_parallel();
Result (cost=0.00..0.26 rows=1 width=8) (actual
time=87981.047..87981.048 rows=1 loops=1)
already_parallel
------------------
0
(1 row)
As far as my understanding goes for this case, lazyEvalOk is set
irrespective of whether the function returns set or not in fmgr_sql,
else
{
randomAccess = false;
lazyEvalOK = true;
}
then it is passed to init_sql_fcache which is then passed to
init_execution_state where cache->lazyEval is set,
if (lasttages && fcache->junkFilter)
{
lasttages->setsResult = true;
if (lazyEvalOK &&
lasttages->stmt->commandType == CMD_SELECT &&
!lasttages->stmt->hasModifyingCTE)
fcache->lazyEval = lasttages->lazyEval = true;
}
Finally, this lazyEval is passed to ExecutorRun in postquel_getnext
that restricts parallelism by setting execute_once = 0,
/* Run regular commands to completion unless lazyEval */
uint64 count = (es->lazyEval) ? 1 : 0;
ExecutorRun(es->qd, ForwardScanDirection, count, !es->lazyEval);
So, this is my concern that why is such a query should not execute in
parallel when in SQL function. If I run this same query from PLpgsql
function then it can run in parallel,
CREATE OR REPLACE FUNCTION not_parallel()
RETURNS bigint AS $$
declare cnt int:=0;
BEGIN
SELECT count(distinct i) into cnt FROM t WHERE j = 12;
RETURN cnt;
END;
$$ LANGUAGE plpgsql;
select not_parallel();
2017-03-21 15:28:56.282 IST [123086] LOG: duration: 0.003 ms plan:
Query Text: SELECT count(distinct i) FROM t WHERE j = 12
Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
Filter: (j = 12)
2017-03-21 15:28:56.282 IST [123087] LOG: duration: 0.003 ms plan:
Query Text: SELECT count(distinct i) FROM t WHERE j = 12
Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4) (actual
time=0.001..0.001 rows=0 loops=1)
Filter: (j = 12)
2017-03-21 15:28:57.530 IST [117823] LOG: duration: 1745.372 ms plan:
Query Text: SELECT count(distinct i) FROM t WHERE j = 12
Aggregate (cost=19.42..19.43 rows=1 width=8) (actual
time=1255.743..1255.743 rows=1 loops=1)
-> Gather (cost=0.00..19.42 rows=1 width=4) (actual
time=1255.700..1255.700 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Parallel Seq Scan on t (cost=0.00..19.42 rows=1 width=4)
(actual time=418.443..418.443 rows=0 loops=3)
Filter: (j = 12)
Rows Removed by Filter: 667
2017-03-21 15:28:57.530 IST [117823] CONTEXT: SQL statement "SELECT
count(distinct i) FROM t WHERE j = 12"
PL/pgSQL function not_parallel() line 4 at SQL statement
2017-03-21 15:28:57.531 IST [117823] LOG: duration: 2584.282 ms plan:
Query Text: select not_parallel();
Result (cost=0.00..0.26 rows=1 width=8) (actual
time=2144.315..2144.316 rows=1 loops=1)
not_parallel
--------------
0
(1 row)
Hence, it appears lazyEval is the main reason behind it and it should
be definitely fixed in my opinion.
Please enlighten me with your comments/opinions.
Regards,
Rafia Sabih
EnterpriseDB: http://www.enterprisedb.com/
From | Date | Subject | |
---|---|---|---|
Next Message | Amit Kapila | 2017-03-21 10:08:10 | Re: pageinspect and hash indexes |
Previous Message | Pavel Stehule | 2017-03-21 10:01:33 | Re: Other formats in pset like markdown, rst, mediawiki |