Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: David Kimura <david(dot)g(dot)kimura(at)gmail(dot)com>
To: Melanie Plageman <melanieplageman(at)gmail(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesse Zhang <sbjesse(at)gmail(dot)com>, dkimura(at)pivotal(dot)io
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2020-04-29 23:44:53
Message-ID: CAHnPFjSV8u=D85RnorugR-5-RR73msDghuQ1sRRnwbVa6S-Oyg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Apr 29, 2020 at 4:39 PM Melanie Plageman
<melanieplageman(at)gmail(dot)com> wrote:
>
> In addition to many assorted TODOs in the code, there are a few major
> projects left:
> - Batch 0 falling back
> - Stripe barrier deadlock
> - Performance improvements and testing
>

Batch 0 never spills. That behavior is an artifact of the existing design that
as an optimization special cases batch 0 to fill the initial hash table. This
means it can skip loading and doesn't need to create a batch file.

However in the pathalogical case where all tuples hash to batch 0 there is no
way to redistribute those tuples to other batches. So, existing hash join
implementation allows work_mem to be exceeded for batch 0.

In adaptive hash join approach, there is another way to deal with a batch that
exceeds work_mem. If increasing the number of batches does not work then the
batch can be split into stripes that will not exceed work_mem. Doing this
requires spilling the excess tuples to batch files. Following patch adds logic
to create a batch 0 file for serial hash join so that even in pathalogical case
we do not need to exceed work_mem.

Thanks,
David

Attachment Content-Type Size
v6-0002-Implement-fallback-of-batch-0-for-serial-adaptive.patch application/octet-stream 4.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jonathan S. Katz 2020-04-29 23:55:16 Re: Poll: are people okay with function/operator table redesign?
Previous Message David Zhang 2020-04-29 23:42:50 Can the OUT parameter be enabled in stored procedure?