Re: effect on planner of turning a subquery to a table, sql function returning table

From: Thierry Henrio <thierry(dot)henrio(at)gmail(dot)com>
To: David Rowley <dgrowleyml(at)gmail(dot)com>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: effect on planner of turning a subquery to a table, sql function returning table
Date: 2024-04-12 14:58:23
Message-ID: CAMPYKo0FouB-HZ1k-_Ur2v+kK71q0T5icQGrp+SPbQJGq0H2Rw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Thanks David!

Setting jit to 0:

GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124) (actual
time=56.603..68.908 rows=4670 loops=1)
Group Key: t.device_id, t.date, t.start_time, t.end_time,
(timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->>
1))::time without time zone))
Buffers: shared hit=3067
-> Sort (cost=401037.82..404706.32 rows=1467400 width=96) (actual
time=56.583..57.847 rows=22908 loops=1)
Sort Key: t.device_id, t.date, t.start_time, t.end_time,
(timerange(((t_1.value ->> 0))::time without time zone, ((t_1.value ->>
1))::time without time zone)), t.rank
Sort Method: quicksort Memory: 3229kB
Buffers: shared hit=3067
-> Nested Loop (cost=2.99..100268.62 rows=1467400 width=96)
(actual time=0.076..31.219 rows=22908 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64)
(actual time=0.055..6.589 rows=11454 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.54..39.31 rows=500 width=40)
(actual time=0.041..0.047 rows=1 loops=1)
Buffers: shared hit=3
-> Index Scan using campaigns_pkey on campaigns
c (cost=0.28..8.30 rows=1 width=355) (actual time=0.011..0.012 rows=1
loops=1)
Index Cond: (id = 11870)
Buffers: shared hit=3
-> Hash Join (cost=2.26..26.01 rows=500
width=40) (actual time=0.028..0.031 rows=1 loops=1)
Hash Cond: (EXTRACT(dow FROM d.date) =
((j.dow)::integer)::numeric)
-> Function Scan on generate_series d
(cost=0.01..10.01 rows=1000 width=8) (actual time=0.007..0.008 rows=1
loops=1)
-> Hash (cost=1.00..1.00 rows=100
width=64) (actual time=0.013..0.014 rows=7 loops=1)
Buckets: 1024 Batches: 1 Memory
Usage: 9kB
-> Function Scan on jsonb_each j
(cost=0.00..1.00 rows=100 width=64) (actual time=0.007..0.008 rows=7
loops=1)
-> Index Scan using device_timeslots_date_index on
device_timeslots t (cost=0.43..111.56 rows=29 width=32) (actual
time=0.013..4.282 rows=11454 loops=1)
Index Cond: ((date = d.date) AND (date >=
'2024-04-26'::date) AND (date <= '2024-04-26'::date))
Buffers: shared hit=3064
-> Memoize (cost=0.01..1.01 rows=100 width=32) (actual
time=0.000..0.000 rows=2 loops=11454)
Cache Key: j.times
Cache Mode: binary
Hits: 11453 Misses: 1 Evictions: 0 Overflows: 0
Memory Usage: 1kB
-> Function Scan on jsonb_array_elements t_1
(cost=0.00..1.00 rows=100 width=32) (actual time=0.002..0.002 rows=2
loops=1)
Planning:
Buffers: shared hit=4
Planning Time: 0.512 ms
Execution Time: 69.476 ms

So as far as I understand, correct me when I'm wrong:

1) EXPLAIN (ANALYZE, BUFFERS) shows the jit timing, which mere EXPLAIN did
not thank!
I found the
https://www.postgresql.org/docs/current/jit-decision.html#JIT-DECISION

2) the subquery is overestimated, estimated 500, actual 1 in the Nested
Loop.
And it amplified by its outer Nested Loop

-> Nested Loop (cost=2.99..100268.62 rows=*1467400* width=96)
(actual time=0.067..27.102 rows=*22908* loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.98..55962.20 rows=14674 width=64)
(actual time=0.055..5.750 rows=11454 loops=1)
Buffers: shared hit=3067
-> Nested Loop (cost=2.54..39.31 rows=*500*
width=40) (actual time=0.039..0.044 rows=*1* loops=1)

The scan of the temp table was overestimated too, but it is mitigated by
the Merge Join

-> Merge Join (cost=67127.99..67631.11 rows=*33165* width=64)
(actual time=7.642..13.455 rows=*22908* loops=1)
Merge Cond: (z.date = t.date)
Buffers: shared hit=3064, local hit=1
-> Sort (cost=78.60..81.43 rows=1130 width=40) (actual
time=0.011..0.012 rows=2 loops=1)
Sort Key: z.date
Sort Method: quicksort Memory: 25kB
Buffers: local hit=1
-> Seq Scan on z11870 z (cost=0.00..21.30 rows=*1130*
width=40) (actual time=0.006..0.007 rows=*2* loops=1)
Buffers: local hit=1

The cost estimation is better in (B) than (A) (even when the estimated rows
of z11870 are off).

So I have options : O1) set jit=0, O2) temp table, O3) rework schema design
(a variation on O2).
Thanks!

On Fri, Apr 12, 2024 at 1:40 PM David Rowley <dgrowleyml(at)gmail(dot)com> wrote:

> On Fri, 12 Apr 2024 at 23:27, Thierry Henrio <thierry(dot)henrio(at)gmail(dot)com>
> wrote:
> > JIT:
> > Functions: 36
> > Options: Inlining true, Optimization true, Expressions true,
> Deforming true
> > Timing: Generation 1.949 ms, Inlining 28.891 ms, Optimization 207.481
> ms, Emission 134.907 ms, Total 373.228 ms
> > Execution Time: 429.037 ms
>
> It looks very much like the majority of the extra time is being spent
> doing JIT compilation. This triggers for plan A but not plan B. You
> can see from:
>
> > GroupAggregate (cost=401037.82..503755.82 rows=1467400 width=124)
> (actual time=416.851..426.534 rows=4670 loops=1)
>
> that the top-level row estimates are off. This makes the estimated
> cost higher than it actually is. The planner opts to have tuple
> deforming and expression evaluation JIT compiled to try to speed up
> the plan thinking it's worthwhile. It's not in this case.
>
> You can switch JIT off to try without with:
>
> SET jit=0;
>
> You might want to consider editing postgresql.conf and raising the
> jit_above_cost, jit_inline_above_cost and jit_optimize_above_cost
> values to some higher value or disable JIT completely.
>
> SELECT pg_reload_conf(); -- to reload the config file afterwards.
>
> David
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message veem v 2024-04-13 07:03:47 Re: Question on trigger
Previous Message David Rowley 2024-04-12 11:39:52 Re: effect on planner of turning a subquery to a table, sql function returning table