From: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
---|---|
To: | Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> |
Cc: | pgsql-hackers(at)postgresql(dot)org, gavin(at)shopwindow(dot)me |
Subject: | Re: Really bad blowups with hash outer join and nulls |
Date: | 2015-02-15 15:33:25 |
Message-ID: | 17667.1424014405@sss.pgh.pa.us |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
Andrew Gierth <andrew(at)tao11(dot)riddles(dot)org(dot)uk> writes:
> A quick test suggests that initializing the hash value to ~0 rather than
> 0 has a curious effect: the number of batches still explodes, but the
> performance does not suffer the same way. (I think because almost all
> the batches end up empty.) I think this is worth doing even in the
> absence of a more general solution; nulls are common enough and
> important enough that they shouldn't be the worst-case value if it can
> be avoided.
I think that's unlikely to be a path to a good solution.
At least part of the problem here is that estimate_hash_bucketsize()
supposes that nulls can be ignored --- which is normally true, and
invalidates your claim that they're common. But in a RIGHT JOIN
situation, they need to be considered as if they were regular keys.
That would probably be sufficient to dissuade the planner from choosing
a hash join in this example.
There may also be something we can do in the executor, but it would
take closer analysis to figure out what's going wrong. I don't think
kluging the behavior for NULL in particular is the answer.
regards, tom lane
From | Date | Subject | |
---|---|---|---|
Next Message | Florian Pflug | 2015-02-15 16:14:21 | Re: restrict global access to be readonly |
Previous Message | Petr Jelinek | 2015-02-15 14:25:59 | Re: Logical Replication Helpers WIP for discussion |