Re: Avoiding hash join batch explosions with extreme skew and weird stats

From: Melanie Plageman <melanieplageman(at)gmail(dot)com>
To: Thomas Munro <thomas(dot)munro(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Avoiding hash join batch explosions with extreme skew and weird stats
Date: 2019-05-18 00:14:59
Message-ID: CAAKRu_aoUJi7eWREto-QFDdroK67w=UeOU1=mf71N87JwJ4iPA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 16, 2019 at 3:22 PM Thomas Munro <thomas(dot)munro(at)gmail(dot)com> wrote:

> Admittedly I don't have a patch, just a bunch of handwaving. One
> reason I haven't attempted to write it is because although I know how
> to do the non-parallel version using a BufFile full of match bits in
> sync with the tuples for outer joins, I haven't figured out how to do
> it for parallel-aware hash join, because then each loop over the outer
> batch could see different tuples in each participant. You could use
> the match bit in HashJoinTuple header, but then you'd have to write
> all the tuples out again, which is more IO than I want to do. I'll
> probably start another thread about that.
>
>
Could you explain more about the implementation you are suggesting?

Specifically, what do you mean "BufFile full of match bits in sync with the
tuples for outer joins?"

Is the implementation you are thinking of one which falls back to NLJ on a
batch-by-batch basis decided during the build phase?
If so, why do you need to keep track of the outer tuples seen?
If you are going to loop through the whole outer side for each tuple on the
inner side, it seems like you wouldn't need to.

Could you make an outer "batch" which is the whole of the outer relation?
That
is, could you do something like: when hashing the inner side, if
re-partitioning
is resulting in batches that will overflow spaceAllowed, could you set a
flag on
that batch use_NLJ and when making batches for the outer side, make one
"batch"
that has all the tuples from the outer side which the inner side batch
which was
flagged will do NLJ with.

--
Melanie Plageman

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ashutosh Sharma 2019-05-18 00:44:15 Re: Passing CopyMultiInsertInfo structure to CopyMultiInsertInfoNextFreeSlot()
Previous Message Andres Freund 2019-05-18 00:01:29 Re: Pluggable Storage - Andres's take