Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesse Zhang <sbjesse(at)gmail(dot)com>, dkimura(at)pivotal(dot)io
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2020-04-30 20:59:35
Message-ID: CA+hUKG+JHKQRJT_u+sYwX5MZfOVG_JKTtNAr0G2QvNZDL_+XGg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, May 1, 2020 at 2:30 AM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
> I made a few edits to the message and threw it into a draft patch (on
> top of master, of course). I didn't want to junk up peoples' inboxes, so
> I didn't start a separate thread, but, it will be pretty hard to
> collaboratively edit the comment/ever register it for a commitfest if it
> is wedged into this thread. What do you think?

+1, this is a good description and I'm sure you're right about the
name of the algorithm. It's a "hybrid" between a simple no partition
hash join, and partitioning like the Grace machine, since batch 0 is
processed directly without touching the disk.

You mention that PHJ finalises the number of batches during build
phase while SHJ can extend it later. There's also a difference in the
probe phase: although inner batch 0 is loaded into the hash table
directly and not written to disk during the build phase (= classic
hybrid, just like the serial algorithm), outer batch 0 *is* written
out to disk at the start of the probe phase (unlike classic hybrid at
least as we have it for serial hash join). That's because I couldn't
figure out how to begin emitting tuples before partitioning was
finished, without breaking the deadlock-avoidance programming rule
that you can't let the program counter escape from the node when
someone might wait for you. So maybe it's erm, a hybrid between
hybrid and Grace...

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2020-04-30 22:06:23 Re: design for parallel backup
Previous Message Andres Freund 2020-04-30 19:52:22 Re: design for parallel backup