Re: New hashed IN code ignores distinctiveness of subquery

From: Bradley Baetz <bbaetz(at)acm(dot)org>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: New hashed IN code ignores distinctiveness of subquery
Date: 2003-01-27 03:22:57
Message-ID: 20030127032257.GA3391@mango.home
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote:

> We're already checking that as a separate plan alternative. The
> implementation could be improved a little, though --- we could combine
> the uniq-ification into the Hash node.

Right, or skip it entirely when selecting stuff with unique constraints.

> AFAICS there are two or three different concepts of selectivity being
> tossed about here.

<snip>

Ah, OK. Yeah, I think I was confusing myself there.

>
> It might be that we have to bite the bullet and set up a different
> selectivity estimation procedure for IN. I'd prefer to avoid that
> but haven't really figured out if it's necessary or not.

I don't think it is. The number of rows is correct if you do product_id
IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on. The
only difference for a subselect is that the number of values generated
is only known approximately, rather than exactly, but I don't think that
thats relevent.

> In the meantime, I think you are right that it's bogus that
> set_joinrel_size_estimates uses different equations for JOIN_IN and
> JOIN_UNIQUE_INNER. Whatever the decision is about how to do the
> estimation, these should be done the same way.

Hmm. So, I made that change, and now get an estimate of 50218 rows for
the join result, which is about right (although capping the result to
have a maximum of the number of input rows is valid for an inner join),
but the cost for the hashjoin is still 4281586.50. cost_hashjoin
probably needs to be taught about the short circuiting done for _IN, but
I'm not sure where to do that from a quick glance through the code.

What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
that JOIN_IN doesn't? executor/* doesn't appear to use it.

> regards, tom lane

Bradley

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Tom Lane 2003-01-27 04:18:31 Re: New hashed IN code ignores distinctiveness of subquery
Previous Message Tom Lane 2003-01-27 02:43:18 Re: New hashed IN code ignores distinctiveness of subquery

Browse pgsql-hackers by date

  From Date Subject
Next Message Tatsuo Ishii 2003-01-27 04:06:03 Re: Sorting Chinese data in Postgresql 7.3.1
Previous Message Tom Lane 2003-01-27 02:43:18 Re: New hashed IN code ignores distinctiveness of subquery