Re: Query is over 2x slower with jit=on

From: Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: Andreas Joseph Krogh <andreas(at)visena(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, "Jonathan S(dot) Katz" <jkatz(at)postgresql(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Query is over 2x slower with jit=on
Date: 2018-09-14 11:18:24
Message-ID: CAJ3gD9cWBKnzXkhu30njaY1au3NO0XD=r4N3GcC_nwFbw5GYYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 11 September 2018 at 14:50, Amit Khandekar <amitdkhan(dot)pg(at)gmail(dot)com> wrote:
> On 10 September 2018 at 21:39, Andres Freund <andres(at)anarazel(dot)de> wrote:
>> Hi,
>>
>> On 2018-09-10 15:42:55 +0530, Amit Khandekar wrote:
>>> Attached is a patch that accumulates the per-worker jit counters into
>>> the leader process.
>>
>> Thanks!
>>
>>
>>> I think we better show per-worker jit info also. The current patch
>>> does not show that. I think it would be easy to continue on the patch
>>> to show per-worker info also. Under the Gather node, we can show
>>> per-worker jit counters. I think this would be useful too, besides the
>>> cumulative figures in the leader process. Comments ?
>>
>> Yes, I think that'd be good.
> Ok. Will continue on the patch.
>
>> I think we either should print the stats at
>> the top level as $leader_value, $combined_worker_value, $total_value or
>> just have the $combined_worker_value at the level where we print other
>> stats from the worker, too.
>
> Yes, I think we can follow and be consistent with whatever way in
> which the other worker stats are printed. Will check.
>
> Note: Since there can be a multiple separate Gather plans under a plan
> tree, I think we can show this info for each Gather plan.

The attached patch shows per-worker information, besides the
cumulative figure in the end of plan. Attached is the complete output
of an aggregate parallel query on tenk1 table (created using the
regression tests). You can see that the per-worker figures are under
each of the Gather plans.

We can show combined values of all the workers under a Gather plan as
one single value, but I guess if we just show per-worker values, we
don't have to additionally show combined value. Comments ?
Also, I have kept the the per-worker info only for verbose=true.

The Gather plan's leader value is not shown. We also don't show
leader-specific values when we show the usual per-worker
instrumentation values. So I think we are consistent. Actually it is a
bit tedious to collect only leader-specific values. And I don't think
it is worth trying for it. For worker, I am not collecting
per-plan-node info, because we want to show total worker figures, and
not per-plan-node figures. For the normal instrumentations, we want to
show per-node info. For leader-specific instrumentation, we would need
to store per-(Gather-)plan info.

Need to finalize the indentation and the other explain formats. One
option is to keep the per-worker JIT info in a single line, like how
we print the normal per-worker instrumentation :
Worker 1: actual time=20.971..35.975 rows=39991 loops=1

Below is a snippet :

Aggregate (cost=18656.00..18656.01 rows=1 width=8) (actual
time=593.917..593.917 rows=1 loops=1)
Output: count(*)
-> Hash Join (cost=10718.00..18456.00 rows=80000 width=0) (actual
time=496.650..593.883 rows=16 loops=1)
Inner Unique: true
Hash Cond: ((a.unique1 = b.unique1) AND (a.two =
(row_number() OVER (?))))
-> Gather (cost=0.00..5918.00 rows=160000 width=8) (actual
time=192.319..223.384 rows=160000 loops=1)
Output: a.unique1, a.two
Workers Planned: 4
Workers Launched: 4
Jit for Worker : 0
Functions: 2
Generation Time: 0.273 ms
Inlining: true
Inlining Time: 43.686 ms
Optimization: true
Optimization Time: 10.788 ms
Emission Time: 8.438 ms
Jit for Worker : 1
Functions: 2
Generation Time: 0.293 ms
Inlining: true
Inlining Time: 72.587 ms
Optimization: true
Optimization Time: 10.386 ms
Emission Time: 8.115 ms
..........
..........
..........
Planning Time: 0.548 ms
Jit:
Functions: 40
Generation Time: 3.892 ms
Inlining: true
Inlining Time: 409.397 ms
Optimization: true
Optimization Time: 174.708 ms
Emission Time: 91.785 ms
Execution Time: 610.262 ms
(98 rows)

>
>>
>>
>>> /*
>>> + * Add up the workers' JIT instrumentation from dynamic shared memory.
>>> + */
>>> +static void
>>> +ExecParallelRetrieveJitInstrumentation(PlanState *planstate,
>>> + SharedJitInstrumentation *shared_jit)
>>> +{
>>> + int n;
>>> + JitContext *jit = planstate->state->es_jit;
>>> +
>>> + /* If the leader hasn't yet created a jit context, allocate one now. */
>>> + if (!jit)
>>> + {
>>> + planstate->state->es_jit = jit =
>>> + jit_create_context(planstate->state->es_jit_flags);
>>> + }
>>
>> Wouldn't it be better to move the jit instrumentation to outside of the
>> context, to avoid having to do this? Or just cope with not having
>> instrumentation for the leader in this case? We'd kinda need to deal
>> with failure to create one anyway?
>
> Yeah, I think taking out the instrumentation out of the context looks
> better. Will work on that.

Not yet done this. Will do now.

--
Thanks,
-Amit Khandekar
EnterpriseDB Corporation
The Postgres Database Company

Attachment Content-Type Size
jit_instr_account_workers_v2.patch application/octet-stream 19.6 KB
jit_explain.out application/octet-stream 5.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Damir Colak 2018-09-14 11:21:33 Re: Code of Conduct plan
Previous Message Thomas Munro 2018-09-14 11:16:05 Re: [PATCH] Fix for infinite signal loop in parallel scan