Quick Links

Re: Query plan prefers hash join when nested loop is much faster

From:	David Rowley <dgrowleyml(at)gmail(dot)com>
To:	iulian dragos <iulian(dot)dragos(at)databricks(dot)com>
Cc:	pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject:	Re: Query plan prefers hash join when nested loop is much faster
Date:	2020-08-24 22:26:49
Message-ID:	CAApHDvq7Q+CNtkHqbpsx9jX0-gmyyZisMg7cWGjq7q72cmx9ow@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-general

On Sat, 22 Aug 2020 at 00:35, iulian dragos
<iulian(dot)dragos(at)databricks(dot)com> wrote:
> I am trying to understand why the query planner insists on using a hash join, and how to make it choose the better option, which in this case would be a nested loop.

> | -> Index Scan using test_result_module_result_id_idx on test_result (cost=0.57..6911.17 rows=4331 width=12) (actual time=0.002..0.002 rows=1 loops=14824) |
> | Index Cond: (module_result_id = module_result.id) |

You might want to check if the pg_stats view reports a realistic
n_distinct value for test_result.module_result_id. If the
pg_class.retuples is correct for that relation then that would
indicate the n_distinct estimate is about 115000. Going by the number
of rows you've mentioned it would appear a more realistic value for
that would be -0.4. which is 0 - 1 / (500000000 / 200000000.0).
However, that's assuming each module_result has a test_result. You
could run a SELECT COUNT(DISTINCT module_result_id) FROM test_result;
to get a better idea.

If ANALYZE is not getting you a good value for n_distinct, then you
can overwrite it. See [1], search for n_distinct.

David

[1] https://www.postgresql.org/docs/current/sql-altertable.html

In response to

Query plan prefers hash join when nested loop is much faster at 2020-08-21 12:35:29 from iulian dragos

Responses

Re: Query plan prefers hash join when nested loop is much faster at 2020-08-25 10:10:05 from iulian dragos

Browse pgsql-general by date

	From	Date	Subject
Next Message	Justin Pryzby	2020-08-25 00:06:25	Re: Row estimates for empty tables
Previous Message	Dirk Krautschick	2020-08-24 21:17:36	Most effective and fast way to load few Tbyte of data from flat files into postgresql