Re: New hashed IN code ignores distinctiveness of subquery

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Bradley Baetz <bbaetz(at)acm(dot)org>
Cc: pgsql-bugs(at)postgresql(dot)org
Subject: Re: New hashed IN code ignores distinctiveness of subquery
Date: 2003-01-27 04:18:31
Message-ID: 19407.1043641111@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs pgsql-hackers

Bradley Baetz <bbaetz(at)acm(dot)org> writes:
> On Sun, Jan 26, 2003 at 09:43:18PM -0500, Tom Lane wrote:
>> We're already checking that as a separate plan alternative. The
>> implementation could be improved a little, though --- we could combine
>> the uniq-ification into the Hash node.

> Right, or skip it entirely when selecting stuff with unique constraints.

I'm hesitant to do that until we have some scheme in place for
invalidating cached plans. Right now, if you drop an index that is used
by some cached plan, you'll hear about it next time the plan is used.
But if the plan doesn't directly use the index, yet depends on its
existence to be correct, you'll get no error and subtly(?) wrong
answers. I don't mind depending on such assumptions in estimating
costs, but I do mind depending on them for correct answers.

> I don't think it is. The number of rows is correct if you do product_id
> IN (1) vs product_id IN (1,2) vs product_id IN (1,2,3) and so on.

But that's a completely different code path; it doesn't even enter the
routines we're concerned about here.

> cost_hashjoin
> probably needs to be taught about the short circuiting done for _IN,

Yeah, I think so. Stepping through cost_hashjoin shows that the major
component of the inflated cost is coming from the per-tuple CPU costs,
which are inflated because we're not allowing for the IN shortcircuit.

> What is the point of JOIN_UNIQUE_{INNER,OUTER}, though? What does it do
> that JOIN_IN doesn't?

Uniqify the inner/outer path and then do a normal inner join. See
joinpath.c.

> executor/* doesn't appear to use it.

No; the executor never sees JOIN_REVERSE_IN, either.

regards, tom lane

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bradley Baetz 2003-01-27 04:50:41 Re: New hashed IN code ignores distinctiveness of subquery
Previous Message Bradley Baetz 2003-01-27 03:22:57 Re: New hashed IN code ignores distinctiveness of subquery

Browse pgsql-hackers by date

  From Date Subject
Next Message Bradley Baetz 2003-01-27 04:50:41 Re: New hashed IN code ignores distinctiveness of subquery
Previous Message Tatsuo Ishii 2003-01-27 04:06:03 Re: Sorting Chinese data in Postgresql 7.3.1