Re: Hybrid Hash/Nested Loop joins and caching results from subplans

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Konstantin Knizhnik <k(dot)knizhnik(at)postgrespro(dot)ru>
Cc: Andy Fan <zhihui(dot)fan1213(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Developers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Date: 2020-12-10 22:03:05
Message-ID: CAApHDvpBk5QYd_Zbh76tvomOf79hHdqV1HKVdGzs9eeKP-TJUw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Thanks a lot for testing this patch. It's good to see it run through a
benchmark that exercises quite a few join problems.

On Fri, 11 Dec 2020 at 05:44, Konstantin Knizhnik
<k(dot)knizhnik(at)postgrespro(dot)ru> wrote:
> For most queries performance is the same, some queries are executed
> faster but
> one query is 150 times slower:
>
>
> explain analyze SELECT MIN(chn.name) AS character,
...
> Execution Time: 523002.608 ms

> I attach file with times of query execution.

I noticed the time reported in results.csv is exactly the same as the
one in the EXPLAIN ANALYZE above. One thing to note there that it
would be a bit fairer if the benchmark was testing the execution time
of the query instead of the time to EXPLAIN ANALYZE.

One of the reasons that the patch may look less favourable here is
that the timing overhead on EXPLAIN ANALYZE increases with additional
nodes.

If I just put this to the test by using the tables and query from [1].

# explain (analyze, costs off) select count(*) from hundredk hk inner
# join lookup l on hk.thousand = l.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------
Aggregate (actual time=1891.262..1891.263 rows=1 loops=1)
-> Nested Loop (actual time=0.312..1318.087 rows=9990000 loops=1)
-> Seq Scan on hundredk hk (actual time=0.299..15.753
rows=100000 loops=1)
-> Result Cache (actual time=0.000..0.004 rows=100 loops=100000)
Cache Key: hk.thousand
Hits: 99000 Misses: 1000 Evictions: 0 Overflows: 0
Memory Usage: 3579kB
-> Index Only Scan using lookup_a_idx on lookup l
(actual time=0.003..0.012 rows=100 loops=1000)
Index Cond: (a = hk.thousand)
Heap Fetches: 0
Planning Time: 3.471 ms
Execution Time: 1891.612 ms
(11 rows)

You can see here the query took 1.891 seconds to execute.

Same query without EXPLAIN ANALYZE.

postgres=# \timing
Timing is on.
postgres=# select count(*) from hundredk hk inner
postgres-# join lookup l on hk.thousand = l.a;
count
---------
9990000
(1 row)

Time: 539.449 ms

Or is it more accurate to say it took just 0.539 seconds?

Going through the same query after disabling; enable_resultcache,
enable_mergejoin, enable_nestloop, I can generate the following table
which compares the EXPLAIN ANALYZE time to the \timing on time.

postgres=# select type,ea_time,timing_time, round(ea_time::numeric /
timing_time::numeric,3) as ea_overhead from results order by
timing_time;
type | ea_time | timing_time | ea_overhead
----------------+----------+-------------+-------------
Nest loop + RC | 1891.612 | 539.449 | 3.507
Merge join | 2411.632 | 1008.991 | 2.390
Nest loop | 2484.82 | 1049.63 | 2.367
Hash join | 4969.284 | 3272.424 | 1.519

Result Cache will be hit a bit harder by this problem due to it having
additional nodes in the plan. The Hash Join query seems to suffer much
less from this problem.

However, saying that. It's certainly not the entire problem here:

Hits: 5 Misses: 156294 Evictions: 0 Overflows: 0 Memory Usage: 9769kB

The planner must have thought there'd be more hits than that or it
wouldn't have thought Result Caching would be a good plan. Estimating
the cache hit ratio using n_distinct becomes much less reliable when
there are joins and filters. A.K.A the real world.

David

[1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message k.jamison@fujitsu.com 2020-12-11 00:24:45 RE: [Patch] Optimize dropping of relation buffers using dlist
Previous Message Tomas Vondra 2020-12-10 21:10:30 extended statistics - functional dependencies vs. MCV lists