From: | Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Really bad blowups with hash outer join and nulls |
Date: | 2015-02-16 14:44:45 |
Message-ID: | 54E2025D.8070505@2ndquadrant.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 16.2.2015 03:38, Andrew Gierth wrote:
>>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
>>>>>> writes:
>
> Tomas> Improving the estimates is always good, but it's not going
> to Tomas> fix the case of non-NULL values (it shouldn't be all
> that Tomas> difficult to create such examples with a value whose
> hash starts Tomas> with a bunch of zeroes).
>
> Right now, I can't get it to plan such an example, because (a) if
> there are no stats to work from then the planner makes fairly
> pessimistic assumptions about hash bucket filling, and (b) if
> there _are_ stats to work from, then a frequently-occurring
> non-null value shows up as an MCV and the planner takes that into
> account to calculate bucketsize.
>
> The problem could only be demonstrated for NULLs because the
> planner was ignoring NULL for the purposes of estimating
> bucketsize, which is correct for all join types except RIGHT and
> FULL (which, iirc, are more recent additions to the hashjoin
> repertoire).
Oh, right, the estimate fix is probably sufficient then.
>
> If you want to try testing it, you may find this useful:
>
> select i, hashint8(i) from unnest(array[1474049294, -1779024306,
-1329041947]) u(i);
> i | hashint8 -------------+---------- 1474049294 | 0
> -1779024306 | 0 -1329041947 | 0 (3 rows)
>
> (those are the only three int4 values that hash to exactly 0)
>
> It's probably possible to construct pathological cases by finding
> a lot of different values with zeros in the high bits of the hash,
> but that's something that wouldn't be likely to happen by chance.
Yeah, it's probably possible, but it's admittedly considerably harder
than I initially thought. For example it could be possible to create
the table with no MCV values but sorted so that all the initial values
have hashvalue=0, triggering (growEnabled=false). But that's rather
unlikely to happen in practice I guess.
A more likely failure scenario is a hash join higher up the plan,
processing results of other joins etc. In that case the estimates will
be tricky, although the planner chooses quite pessimistic defaults in
those cases.
--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From | Date | Subject | |
---|---|---|---|
Next Message | Dmitry Dolgov | 2015-02-16 15:28:13 | Re: GSoC 2015 - mentors, students and admins. |
Previous Message | Thom Brown | 2015-02-16 12:54:07 | Re: GSoC 2015 - mentors, students and admins. |