Re: Really bad blowups with hash outer join and nulls

From: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
To: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Really bad blowups with hash outer join and nulls
Date: 2015-02-16 02:38:04
Message-ID: 87a90elk7f.fsf@news-spur.riddles.org.uk
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

>>>>> "Tomas" == Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com> writes:

Tomas> Improving the estimates is always good, but it's not going to
Tomas> fix the case of non-NULL values (it shouldn't be all that
Tomas> difficult to create such examples with a value whose hash starts
Tomas> with a bunch of zeroes).

Right now, I can't get it to plan such an example, because (a) if there
are no stats to work from then the planner makes fairly pessimistic
assumptions about hash bucket filling, and (b) if there _are_ stats to
work from, then a frequently-occurring non-null value shows up as an MCV
and the planner takes that into account to calculate bucketsize.

The problem could only be demonstrated for NULLs because the planner was
ignoring NULL for the purposes of estimating bucketsize, which is
correct for all join types except RIGHT and FULL (which, iirc, are more
recent additions to the hashjoin repertoire).

If you want to try testing it, you may find this useful:

select i, hashint8(i) from unnest(array[1474049294, -1779024306, -1329041947]) u(i);
i | hashint8
-------------+----------
1474049294 | 0
-1779024306 | 0
-1329041947 | 0
(3 rows)

(those are the only three int4 values that hash to exactly 0)

It's probably possible to construct pathological cases by finding a lot
of different values with zeros in the high bits of the hash, but that's
something that wouldn't be likely to happen by chance.

Tomas> I think this might be solved by relaxing the check a bit.

Yeah, that looks potentially useful.

--
Andrew (irc:RhodiumToad)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2015-02-16 02:46:15 Re: New CF app deployment
Previous Message Peter Geoghegan 2015-02-16 02:29:36 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE} 2.0