Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, Jesse Zhang <sbjesse(at)gmail(dot)com>, dkimura(at)pivotal(dot)io
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2020-04-29 02:03:53
Message-ID: CAAKRu_Z2qKMvdD3=J7-Gk1-0eu94NSHNDkL5E4EnGEdS=hTX0w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've attached a patch which should address some of the previous feedback
about code complexity. Two of my co-workers and I wrote what is
essentially a new prototype of the idea. It uses the main state machine
to route emitting unmatched tuples instead of introducing a separate
state. The logic for falling back is also more developed.

In addition to many assorted TODOs in the code, there are a few major
projects left:
- Batch 0 falling back
- Stripe barrier deadlock
- Performance improvements and testing

I will address the stripe barrier deadlock here. David is going to send
a separate email about batch 0 falling back.

There is a deadlock hazard in parallel hashjoin (pointed out by Thomas
Munro in the past). Workers attached to the stripe_barrier emit tuples
and then wait on that barrier.
I believe that that can be addressed starting with this
relatively unoptimized solution:
- after probing a stripe in a batch, a worker sets the status of that
batch to "tentatively done" and saves the stripe_barrier phase
- if that worker is not the only worker attached to that batch, it
detaches from both stripe and batch barriers and moves on to other
batches
- if that worker is the only worker attached to the batch, it will
proceed to load the next stripe of that batch, and, once it has
finished loading, it will set the status of the batch back to "not
done" for itself
- when the other worker encounters that batch again, if the
stripe_barrier phase has not moved forward, it will mark that batch as
done for itself. if the stripe_barrier phase has moved forward, it can
join in in probing this batch for the current stripe.

Attachment Content-Type Size
v6-0001-Implement-Adaptive-Hashjoin.patch text/x-patch 131.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message raf 2020-04-29 02:29:28 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)
Previous Message David G. Johnston 2020-04-29 01:22:42 Re: PostgreSQL CHARACTER VARYING vs CHARACTER VARYING (Length)