Quick Links

Re: slow self-join query

From:	Robert Poor <rdpoor(at)gmail(dot)com>
To:	Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject:	Re: slow self-join query
Date:	2012-03-18 14:37:24
Message-ID:	CAGHqdqU4C7JhdCdb47OHO6KTKa4YwxSAZffnMyLYsVD0Bt1T5g@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-performance

On Sat, Mar 17, 2012 at 23:07, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>wrote:

> Yeah try setting [work_mem] to something absurd like 500MB and see if the
> plan changes.
>

Suweet! Sorting now runs in-memory, and that makes a big difference, even
when groveling over 1M records (under 12 seconds rather than 7 hours).
Results in

http://explain.depesz.com/s/hNO

On Sat, Mar 17, 2012 at 23:09, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
wrote:

> Also it looks like you're still not using the index on this:
>
> Subquery Scan u1 (cost=0.00..313.55 rows=50 width=4) (actual
> time=0.030..147.136 rows=10000 loops=1)
>
> Filter: ((u1.type)::text = 'User::Twitter'::text)
>
> Are you sure you're using an indexable condition?
>

I know that users.type is indexed -- what would keep that from being
honored? FWIW, I believe that all user.type fields are set to
User::Twitter, but that will change in the future.

On Sat, Mar 17, 2012 at 23:12, Scott Marlowe <scott(dot)marlowe(at)gmail(dot)com>
wrote:

>
> Also also this looks like it's the most expensive operation:
>
> Seq Scan on followings f2 (cost=0.00..93523.95 rows=5534395 width=8)
> (actual time=0.041..19365.834 rows=5535964 loops=1)
>
> I'm guessing the f2.follower_id isn't very selective?
>

Not 100% sure what you mean -- f2.follower_id is very sparse (compared to
f1.follower_id), but that's the point of this particular query. But since
upping work_mem makes it run really fast, I'm not overly concerned about
this one. Thanks for your help!

One last thought: I could re-cast this as a subquery / query pair, each
with a single join. Am I correct in thinking that could make it really
easy on the planner (especially if the tables were properly indexed)?

Thanks again.

- r

In response to

Re: slow self-join query at 2012-03-18 00:57:08 from Scott Marlowe

Responses

Re: slow self-join query at 2012-03-18 14:51:21 from Andrew Dunstan
Re: slow self-join query at 2012-03-18 15:30:05 from Scott Marlowe
Re: slow self-join query at 2012-03-18 15:44:46 from Scott Marlowe

Browse pgsql-performance by date

	From	Date	Subject
Next Message	Andrew Dunstan	2012-03-18 14:51:21	Re: slow self-join query
Previous Message	Scott Marlowe	2012-03-18 00:57:08	Re: slow self-join query