Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct

From: Andres Freund <andres(at)anarazel(dot)de>
To: pgsql-bugs(at)postgresql(dot)org
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, casey(dot)shobe(at)messagesystems(dot)com
Subject: Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Date: 2012-01-12 00:40:34
Message-ID: 201201120140.35229.andres@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

On Thursday, January 12, 2012 01:01:01 AM Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:

> Also, why the asymmetry in null handling? And why did you only touch
> one of the two code paths in eqjoinsel_semi? They have both got this
> issue of how to estimate with inadequate stats.
This patch was purely a proof of concept, sorry if that wasn't clear. I mostly
wanted to point out that real plans regressed with this change. Digging a bit
around I could find more examples where it caused real pain. But also cases
were the new behaviour was advantageous.
Unfortunately the pastebins where raptelan provided plans expired by now...
Perhaps he can provide them again?

If we aggree on a way to handle the stats I am happy to produce a patch that
actually tries to cover all the cases.

> > Whats your opinion on this?
> Looks pretty bogus to me. You're essentially assuming that the side of
> the join without statistics is unique, which is a mighty dubious
> assumption.
It sure is a bit dubious. But assuming that a semijoin that has max of n rows
on the inner side results in half of the outer sides rows (>> n) is pretty
bogus as well. Using the asumption of uniqueness for the outer side seems
sensible if its only used as a upper limit (Except in an antijoin ...).

Yes, my "patch" didn't even start to do this ;)

SELECT * FROM blub WHERE foo IN (SELECT something_with_aggregation); is not
exactly a fringe case, so I find it problematic regressing quite a bit in the
estimates.

> (In cases where we *know* it's unique, something like this
> could be reasonable, but I believe get_variable_numdistinct already
> accounts for such cases.)
Only that we infer uniqueness only from very few things unless I miss
something...

Andres

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Andres Freund 2012-01-12 00:53:02 Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct
Previous Message Tom Lane 2012-01-12 00:01:01 Re: Botched estimation in eqjoinsel_semi for cases without reliable ndistinct