Re: Subquery in a JOIN not getting restricted?

From: Jay Levitt <jay(dot)levitt(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Kevin Grittner <Kevin(dot)Grittner(at)wicourts(dot)gov>, Merlin Moncure <mmoncure(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Subquery in a JOIN not getting restricted?
Date: 2011-11-16 14:06:34
Message-ID: 4EC3C36A.3020502@gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom Lane wrote:
> Jay Levitt<jay(dot)levitt(at)gmail(dot)com> writes:
>> If the query was more like
>
>> select questions.id
>> from questions
>> join (
>> select sum(u.id)
>> from users as u
>> group by u.id
>> ) as s
>> on s.id = questions.user_id
>> where questions.id = 1;
>
>> would you no longer be surprised that it scanned all user rows?
>
> I'd suggest rephrasing the query to do the join underneath the GROUP BY.

Well, my real goal is to have that inner query in a set-returning function
that gives a computed table of other users relative to the current user, and
then be able to JOIN that with other things and ORDER BY it:

select questions.id
from questions
join (select * from relevance(current_user)) as r
on r.id = questions.user_id
where questions.id = 1;

I assume there's no way for that function (in SQL or PL/pgSQL) to reach to
the upper node and say "do that join again here", or force the join order
from down below? I can't imagine how there could be, but never hurts to ask.
Right now, our workaround is to pass the joined target user as a function
parameter and do the JOIN in the function, but that means we have to put the
function in the select list, else we hit the lack of LATERAL support:

-- This would need LATERAL

select questions.id
from questions
join (
select * from relevance(current_user, questions.user_id)) as r
)
on r.id = questions.user_id
where questions.id = 1;

-- This works but has lots of row-at-a-time overhead

select questions.id, (
select * from relevance(current_user, questions.user_id)
) as r
from questions
where questions.id = 1;

Again, just checking if there's a solution I'm missing. I know the
optimizer is only asymptotically approaching optimal!

Jay

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tomas Vondra 2011-11-16 16:34:10 Re: Slow queries / commits, mis-configuration or hardware issues?
Previous Message Greg Smith 2011-11-16 06:51:15 Re: What's the state of postgresql on ext4 now?