Re: slow self-join query

From: Merlin Moncure <mmoncure(at)gmail(dot)com>
To: Robert Poor <rdpoor(at)gmail(dot)com>
Cc: Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: slow self-join query
Date: 2012-03-19 13:35:14
Message-ID: CAHyXU0w8D5UTe4XNN+s0+=kF=5-_xsyAiJFBJ2OTHkUeJUXkTg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Sun, Mar 18, 2012 at 10:57 PM, Robert Poor <rdpoor(at)gmail(dot)com> wrote:
> On Sun, Mar 18, 2012 at 08:30, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com> wrote:
>> Why are you joining twice to the parent table?  If you're trying to
>> recurse without a with clause, then wouldn't you join the last table
>> to the one before it?
>
> I'm FAR from being an SQL expert; there's a significant chance that
> I'm not thinking about this right.  My intention for this query
> (slightly renamed since the original post):
>
>    SELECT F2.leader_id as leader_id, COUNT(U1.id) AS count
>      FROM users AS U1
> INNER JOIN user_associations AS F1 ON F1.follower_id = U1.id
> INNER JOIN user_associations AS F2 ON F2.follower_id = U1.id
>     WHERE F1.leader_id = 321
>  GROUP BY F2.leader_id
>
> is "among users that follow leader 321, who are the most widely
> followed leaders?", or more formally, find all the users that are
> followers of user 321 (via inner join on F1)  Of those users, tally up
> their leaders so we know which leaders are most popular.  Recall that
> the user_associations table is simply a user-to-user association:
>
>  create_table "user_associations", :force => true do |t|
>  t.integer  "follower_id"
>  t.integer  "leader_id"
>  end
>
> Is there a better way to do this?

hm. Something does not seem right with your query. You're joining in
the same table twice with the same clause:

INNER JOIN user_associations AS F1 ON F1.follower_id = U1.id
INNER JOIN user_associations AS F2 ON F2.follower_id = U1.id

I think you meant to cascade through the follower back to the leader.
(maybe not..it's early monday and the coffee hasn't worked it's way
through the fog yet)...

Also, do you really need to involve the user table? You're counting
U1.Id which is equivalent to F2.follower_id.

try this and see what pops out (i may not have the F1/F2 join quite right):
SELECT F2.leader_id as leader_id, COUNT(*) AS count
FROM user_associations AS F1
INNER JOIN user_associations AS F2 ON F1.follower_id = F2.leader_id
WHERE F1.leader_id = 321
GROUP BY 1;

merlin

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Kevin Grittner 2012-03-19 14:27:06 Re: slow self-join query
Previous Message Robert Poor 2012-03-19 03:57:37 Re: slow self-join query