Re: Why is a hash join preferred when it does not fit in work_mem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Why is a hash join preferred when it does not fit in work_mem
Date: 2023-01-14 16:17:24
Message-ID: 1007582.1673713044@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

Dimitrios Apostolou <jimis(at)gmx(dot)net> writes:
> Please correct me if I'm wrong, as I'm a newcomer to PostgreSQL, but here
> is how I understand things according to posts I've read, and classical
> algorithms:

> + The Hash Join is fastest when one side fits in work_mem. Then on one
> hand you have a hash table lookup (amortized O(1)) and on the other
> hand, if the table has M rows that that do not fit in memory, you have
> sequential reads from the disk (given low fragmentation of the table or
> index files): For every line you read from the disk, you lookup the key
> in the hash table.

> If the hash table does not fit in RAM then the cost becomes prohibitive.
> Every lookup is a random access possibly hitting the disk. The total
> cost should be random_page_cost * M.

That would be true of a simple hash join, but Postgres uses batched
hash joins: we split up the hash key space into subsets, where hopefully
each subset includes few enough inner-side rows to fit into work_mem.
While this can go wrong given pathological distribution of the inner-side
keys, it does mean that the join can perform well even when the inner
side is much larger than work_mem. So it's not the case that the planner
will simply disregard hash joins beyond work_mem. It will apply a cost
penalty for the predicted batching overhead; but that can still come out
cheaper than merge join, because the sorting needed for merge is generally
also far from cheap.

> So I would expect an increased random_page_cost to benefit the Merge Join
> algorithm. And since my setup involves spinning disks, it does makes sense
> to increase it.

What is probably really happening is that random_page_cost affects the
estimated cost of performing the sort using an index scan instead of
a bespoke sort step. AFAIR, cost_sort doesn't consider random_page_cost
at all, and neither does cost_hashjoin.

regards, tom lane

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2023-01-14 16:23:07 Re: row estimate for partial index
Previous Message Peter J. Holzer 2023-01-14 08:51:33 Re: does refreshing materialized view make the database bloat?