Re: Postgres 11 release notes

From: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>
To: Bruce Momjian <bruce(at)momjian(dot)us>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Postgres 11 release notes
Date: 2018-05-15 03:15:07
Message-ID: CAA4eK1LiHf_-Cg47QYmjr9+wrH=_GFUdE3cxD_WWSxHrmNf33g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-www

On Tue, May 15, 2018 at 2:05 AM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
> On Sun, May 13, 2018 at 04:40:19PM +0530, Amit Kapila wrote:
>> On Fri, May 11, 2018 at 8:38 PM, Bruce Momjian <bruce(at)momjian(dot)us> wrote:
>> > I have committed the first draft of the Postgres 11 release notes. I
>> > will add more markup soon. You can view the most current version here:
>> >
>> > http://momjian.us/pgsql_docs/release-11.html
>> >
>> > I expect a torrent of feedback. ;-)
>> >
>>
>> Thanks for compiling the first draft. I think there are few things
>> which we should add to the Parallel Queries section. (a) commit
>> e89a71fb449af2ef74f47be1175f99956cf21524 -
>> Pass InitPlan values to workers via Gather (Merge). We can write
>> something like "Allow the part of the plan that references an initplan
>> to run in parallel" or "Parallelise queries containing initplans",
>
> Uh, I need some explaination of what an initplan is
>

See the below comment in the code:
" This module is concerned with executing SubPlan expression nodes,
which should not be confused with sub-SELECTs appearing in FROM.
SubPlans are divided into "initplans", which are those that need only
one evaluation per query (among other restrictions, this requires that
they don't use any direct correlation variables from the parent plan
level), and "regular" subplans, which are re-evaluated every time
their result is required.

> and what type of
> query this helps.
>

Serial-Plan
-----------------
postgres=# explain select * from t1 where t1.k=(select max(k) from t3);
QUERY PLAN
--------------------------------------------------------------------
Seq Scan on t1 (cost=35.51..71.01 rows=10 width=12)
Filter: (k = $0)
InitPlan 1 (returns $0)
-> Aggregate (cost=35.50..35.51 rows=1 width=4)
-> Seq Scan on t3 (cost=0.00..30.40 rows=2040 width=4)
(5 rows)

Parallel-Plan
--------------------
postgres=# explain select * from t1 where t1.k=(select max(k) from t3);
QUERY PLAN
---------------------------------------------------------------------------------------
Gather (cost=9.71..19.38 rows=2 width=12)
Workers Planned: 2
Params Evaluated: $1
InitPlan 1 (returns $1)
-> Finalize Aggregate (cost=9.70..9.71 rows=1 width=4)
-> Gather (cost=9.69..9.70 rows=2 width=4)
Workers Planned: 2
-> Partial Aggregate (cost=9.69..9.70 rows=1 width=4)
-> Parallel Seq Scan on t3 (cost=0.00..8.75
rows=375 width=4)
-> Parallel Seq Scan on t1 (cost=0.00..9.67 rows=1 width=12)
Filter: (k = $1)
(11 rows)

Here, you can observe that initplan itself gets parallelized and the
part of plan tree referring initplan also got parallelized.

The other example from regression test is:

explain (costs off)
select count(*) from tenk1
where tenk1.unique1 = (Select max(tenk2.unique1) from tenk2);
QUERY PLAN
------------------------------------------------------
Aggregate
InitPlan 1 (returns $2)
-> Finalize Aggregate
-> Gather
Workers Planned: 2
-> Partial Aggregate
-> Parallel Seq Scan on tenk2
-> Gather
Workers Planned: 4
Params Evaluated: $2
-> Parallel Seq Scan on tenk1
Filter: (unique1 = $2)
(12 rows)

I can share more examples if required.

>> (b) "Parallelise queries that contains expensive functions in target
>> lists" (Commits - 3f90ec8597c3515e0d3190613b31491686027e4b and
>> 11cf92f6e2e13c0a6e3f98be3e629e6bd90b74d5).
>
> So we generate the entire query before the target list function calls,
> the run another parallel run just for those function calls?
>

No, it is not like that. We divide the scan among workers and each
worker should perform projection of the rows it scanned (after
applying filter). Now, if the expensive functions are part of target
lists, then we can push the computation of expensive functions (as
part of target list) in workers which will divide the work.

> Really? Do
> we run each column in its own worker or do we split the result set into
> parts and run those in parallel? How do we know, just the function call
> costs?
>

The function's cost can be determined via pg_proc->procost. For this
particular case, you can refer the call graph -
create_pathtarget->set_pathtarget_cost_width->cost_qual_eval_node->cost_qual_eval_walker->get_func_cost

> I can admit I never saw that coming.
>

I think the use case becomes interesting with parallel query because
now you can divide such cost among workers.

Feel free to ask more questions if above doesn't clarify the usage of
these features.

--
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2018-05-15 03:46:02 Re: Postgres 11 release notes
Previous Message Dang Minh Huong 2018-05-15 03:12:15 Re: Postgres 11 release notes

Browse pgsql-www by date

  From Date Subject
Next Message Amit Langote 2018-05-15 03:46:02 Re: Postgres 11 release notes
Previous Message Dang Minh Huong 2018-05-15 03:12:15 Re: Postgres 11 release notes