From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Dimitrios Apostolou <jimis(at)gmx(dot)net> |
Cc: | David Rowley <dgrowleyml(at)gmail(dot)com>, pgsql-general(at)lists(dot)postgresql(dot)org |
Subject: | Re: Why is a hash join preferred when it does not fit in work_mem |
Date: | 2023-01-14 16:17:24 |
Message-ID: | 1007582.1673713044@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-general |
Dimitrios Apostolou <jimis(at)gmx(dot)net> writes:
> Please correct me if I'm wrong, as I'm a newcomer to PostgreSQL, but here
> is how I understand things according to posts I've read, and classical
> algorithms:
> + The Hash Join is fastest when one side fits in work_mem. Then on one
> hand you have a hash table lookup (amortized O(1)) and on the other
> hand, if the table has M rows that that do not fit in memory, you have
> sequential reads from the disk (given low fragmentation of the table or
> index files): For every line you read from the disk, you lookup the key
> in the hash table.
> If the hash table does not fit in RAM then the cost becomes prohibitive.
> Every lookup is a random access possibly hitting the disk. The total
> cost should be random_page_cost * M.
That would be true of a simple hash join, but Postgres uses batched
hash joins: we split up the hash key space into subsets, where hopefully
each subset includes few enough inner-side rows to fit into work_mem.
While this can go wrong given pathological distribution of the inner-side
keys, it does mean that the join can perform well even when the inner
side is much larger than work_mem. So it's not the case that the planner
will simply disregard hash joins beyond work_mem. It will apply a cost
penalty for the predicted batching overhead; but that can still come out
cheaper than merge join, because the sorting needed for merge is generally
also far from cheap.
> So I would expect an increased random_page_cost to benefit the Merge Join
> algorithm. And since my setup involves spinning disks, it does makes sense
> to increase it.
What is probably really happening is that random_page_cost affects the
estimated cost of performing the sort using an index scan instead of
a bespoke sort step. AFAIR, cost_sort doesn't consider random_page_cost
at all, and neither does cost_hashjoin.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Tom Lane | 2023-01-14 16:23:07 | Re: row estimate for partial index |
Previous Message | Peter J. Holzer | 2023-01-14 08:51:33 | Re: does refreshing materialized view make the database bloat? |