Re: [PATCH] Resolve Parallel Hash Join Performance Issue

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: "Deng, Gang" <gang(dot)deng(at)intel(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Resolve Parallel Hash Join Performance Issue
Date: 2020-01-27 02:09:45
Message-ID: CA+hUKGJjFMQTrM9bXLTYV=ZCsRf25XK34xTPtn5jjmN4CKbukQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 21, 2020 at 6:20 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:
> On Fri, Jan 10, 2020 at 1:52 PM Deng, Gang <gang(dot)deng(at)intel(dot)com> wrote:
> > Thank you for the comment. Yes, I agree the alternative of using '(!parallel)', so that no need to test the bit. Will someone submit patch to for it accordingly?
>
> Here's a patch like that.

Pushed. Thanks again for the report!

I didn't try the TPC-DS query, but could see a small improvement from
this on various simple queries, especially with a fairly small hash
table and a large outer relation, when many cores are probing.

(Off topic for this thread, but after burning a few hours on a 72-way
box investigating various things including this, I was reminded of the
performance drop-off for joins with large hash tables that happens
somewhere around 8-16 workers. That's because we can't give 32KB
chunks out fast enough, and if you increase the chunk size it helps
only a bit. That really needs some work; maybe something like a
separation of reservation and allocation, so that multiple segments
can be created in parallel while respecting limits, or something like
that. The other thing I was reminded of: FreeBSD blows Linux out of
the water on big parallel hash joins on identical hardware; I didn't
dig further today but I suspect this may be down to lack of huge pages
(TLB misses), and perhaps also those pesky fallocate() calls. I'm
starting to wonder if we should have a new GUC shared_work_mem that
reserves a wodge of shm in the main region, and hand out 'fast DSM
segments' from there, or some other higher level abstraction that's
wired into the resource release system; they would benefit from
huge_pages=try on Linux, they'd be entirely allocated (in the VM
sense) and there'd be no system calls, though admittedly there'd be
more ways for things to go wrong...)

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-01-27 02:19:04 Re: table partitioning and access privileges
Previous Message Mark Dilger 2020-01-27 02:05:24 Re: making the backend's json parser work in frontend code