Re: Really bad blowups with hash outer join and nulls

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Really bad blowups with hash outer join and nulls
Date: 2015-02-16 14:44:45
Message-ID: 54E2025D.8070505@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 16.2.2015 03:38, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
>>>>>> writes:
>
> Tomas> Improving the estimates is always good, but it's not going
> to Tomas> fix the case of non-NULL values (it shouldn't be all
> that Tomas> difficult to create such examples with a value whose
> hash starts Tomas> with a bunch of zeroes).
>
> Right now, I can't get it to plan such an example, because (a) if
> there are no stats to work from then the planner makes fairly
> pessimistic assumptions about hash bucket filling, and (b) if
> there _are_ stats to work from, then a frequently-occurring
> non-null value shows up as an MCV and the planner takes that into
> account to calculate bucketsize.
>
> The problem could only be demonstrated for NULLs because the
> planner was ignoring NULL for the purposes of estimating
> bucketsize, which is correct for all join types except RIGHT and
> FULL (which, iirc, are more recent additions to the hashjoin
> repertoire).

Oh, right, the estimate fix is probably sufficient then.

>
> If you want to try testing it, you may find this useful:
>
> select i, hashint8(i) from unnest(array[1474049294, -1779024306,
-1329041947]) u(i);
> i | hashint8 -------------+---------- 1474049294 | 0
> -1779024306 | 0 -1329041947 | 0 (3 rows)
>
> (those are the only three int4 values that hash to exactly 0)
>
> It's probably possible to construct pathological cases by finding
> a lot of different values with zeros in the high bits of the hash,
> but that's something that wouldn't be likely to happen by chance.

Yeah, it's probably possible, but it's admittedly considerably harder
than I initially thought. For example it could be possible to create
the table with no MCV values but sorted so that all the initial values
have hashvalue=0, triggering (growEnabled=false). But that's rather
unlikely to happen in practice I guess.

A more likely failure scenario is a hash join higher up the plan,
processing results of other joins etc. In that case the estimates will
be tricky, although the planner chooses quite pessimistic defaults in
those cases.

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Dmitry Dolgov 2015-02-16 15:28:13 Re: GSoC 2015 - mentors, students and admins.
Previous Message Thom Brown 2015-02-16 12:54:07 Re: GSoC 2015 - mentors, students and admins.