Re: Really bad blowups with hash outer join and nulls

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk>
Cc: pgsql-hackers(at)postgresql(dot)org, gavin(at)shopwindow(dot)me
Subject: Re: Really bad blowups with hash outer join and nulls
Date: 2015-02-15 15:33:25
Message-ID: 17667.1424014405@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> A quick test suggests that initializing the hash value to ~0 rather than
> 0 has a curious effect: the number of batches still explodes, but the
> performance does not suffer the same way. (I think because almost all
> the batches end up empty.) I think this is worth doing even in the
> absence of a more general solution; nulls are common enough and
> important enough that they shouldn't be the worst-case value if it can
> be avoided.

I think that's unlikely to be a path to a good solution.

At least part of the problem here is that estimate_hash_bucketsize()
supposes that nulls can be ignored --- which is normally true, and
invalidates your claim that they're common. But in a RIGHT JOIN
situation, they need to be considered as if they were regular keys.
That would probably be sufficient to dissuade the planner from choosing
a hash join in this example.

There may also be something we can do in the executor, but it would
take closer analysis to figure out what's going wrong. I don't think
kluging the behavior for NULL in particular is the answer.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Florian Pflug 2015-02-15 16:14:21 Re: restrict global access to be readonly
Previous Message Petr Jelinek 2015-02-15 14:25:59 Re: Logical Replication Helpers WIP for discussion