Re: Why is a hash join preferred when it does not fit in work_mem

From: David Rowley <dgrowleyml(at)gmail(dot)com>
To: Dimitrios Apostolou <jimis(at)gmx(dot)net>
Cc: pgsql-general(at)lists(dot)postgresql(dot)org
Subject: Re: Why is a hash join preferred when it does not fit in work_mem
Date: 2023-01-13 01:38:26
Message-ID: CAApHDvpPVydoNkEqLyBSbjWq8kq8M7YWdkA44rTeA2MNaO3jsw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Fri, 13 Jan 2023 at 07:33, Dimitrios Apostolou <jimis(at)gmx(dot)net> wrote:
>
> I have a very simple NATURAL JOIN that does not fit in the work_mem. Why
> does the query planner prefer a hash join that needs 361s, while with a
> sort operation and a merge join it takes only 13s?

It's a simple matter of that the Hash Join plan appears cheaper based
on the costs that the planner has calculated.

A better question to ask would be, where are the costs inaccurate? and why.

One thing I noticed in your EXPLAIN ANALYZE output is that the Index
Scan to workitems_ids costed more expensively than the Seq scan, yet
was faster.

> -> Seq Scan on public.workitem_ids (cost=0.00..59780.19 rows=1373719 width=237) (actual time=0.026..1912.312 rows=1373737 loops=1)

> -> Index Scan using workitem_ids_pkey on public.workitem_ids (cost=0.43..81815.86 rows=1373719 width=237) (actual time=0.111..1218.363 rows=1373737 loops=1)

Perhaps the Seq scan is doing more actual I/O than the index scan is.

> The low work_mem and the disabled memoization are set on purpose, in order
> to simplify a complex query, while reproducing the same problem that I
> experienced there. This result is the simplest query I could get, where
> the optimizer does not go for a faster merge join.
>
> From my point of view a merge join is clearly faster, because the hash
> table does not fit in memory and I expect a hash join to do a lot of
> random I/O. But the query planner does not see that, and increasing
> random_page_cost does not help either. In fact the opposite happens: the
> merge join gets a higher cost difference to the hash join, as I increase
> the random page cost!

I'd expect reducing random_page_cost to make the Mege Join cheaper as
that's where the Index Scan is. I'm not quite sure where you think the
random I/O is coming from in a batched hash join.

It would be interesting to see the same plans with SET track_io_timing
= on; set. It's possible that there's less *actual* I/O going on with
the Merge Join plan vs the Hash Join plan. Since we do buffered I/O,
without track_io_timing, we don't know if the read buffers resulted in
an actual disk read or a read from the kernel buffers.

David

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message vignesh C 2023-01-13 06:33:26 Re: Support logical replication of DDLs
Previous Message David G. Johnston 2023-01-12 18:41:29 Re: gexec from command prompt?