ORDER BY ... LIMIT and JOIN

From: Fizu <Fizu(at)advancedsl(dot)com(dot)ar>
To: pgsql-performance(at)postgresql(dot)org
Subject: ORDER BY ... LIMIT and JOIN
Date: 2009-08-08 06:02:47
Message-ID: 92cf04420908072302u7cec91f0se962665e771a5d03@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hello,

I'm trying to optimize the follow query which returns the top users
ordered by ranking. I'll show you my schema and "explain analyze" for
each case.

So, i'm asking two things:

1) Why "ranking" index is not used in the second query when sorting.
2) Am i missing some obvious optimization like a missing index? :)

Schemas:

# \d ranking
Table "public.ranking"
Column | Type | Modifiers
-----------+-----------------------+-----------
ranking | bigint |
score | double precision |
username | character varying(20) | not null
variation | bigint |
Indexes:
"ranking_tmp_pkey1" PRIMARY KEY, btree (username)
"idxrank_6057" btree (ranking) CLUSTER

# \d user
Table "public.user"
Column | Type | Modifiers
------------+-----------------------+---------------------------------------------------
id | integer | not null default
nextval('user_id_seq'::regclass)
username | character varying(20) | not null
about | text |
name | character varying(50) |
photo | text |
country_id | integer |
Indexes:
"user_pkey" PRIMARY KEY, btree (username)
"country_ranking_user_idx" btree (country_id)

Explain:

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") WHERE "user"."country_id" =
1 ORDER BY "ranking"."ranking" ASC LIMIT 100;

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=13.03..13.04 rows=1 width=180) (actual
time=965.229..965.302 rows=100 loops=1)
-> Sort (cost=13.03..13.04 rows=1 width=180) (actual
time=965.227..965.256 rows=100 loops=1)
Sort Key: ranking.ranking
Sort Method: top-N heapsort Memory: 56kB
-> Nested Loop (cost=0.00..13.02 rows=1 width=180) (actual
time=0.049..900.847 rows=57309 loops=1)
-> Index Scan using country_ranking_user_idx on "user"
(cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
rows=57309 loops=1)
Index Cond: (country_id = 1)
-> Index Scan using ranking_tmp_pkey1 on ranking
(cost=0.00..6.52 rows=1 width=35) (actual time=0.013..0.013 rows=1
loops=57309)
Index Cond: ((ranking.username)::text =
("user".username)::text)
Total runtime: 965.412 ms
(10 rows)

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") ORDER BY
"ranking"."ranking" ASC LIMIT 100;

QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..137.02 rows=100 width=180) (actual
time=0.056..1.973 rows=100 loops=1)
-> Nested Loop (cost=0.00..3081316.65 rows=2248753 width=180)
(actual time=0.055..1.921 rows=100 loops=1)
-> Index Scan using idxrank_6057 on ranking
(cost=0.00..70735.73 rows=2248753 width=35) (actual time=0.021..0.076
rows=100 loops=1)
-> Index Scan using user_pkey on "user" (cost=0.00..1.33
rows=1 width=145) (actual time=0.016..0.017 rows=1 loops=100)
Index Cond: (("user".username)::text = (ranking.username)::text)
Total runtime: 2.043 ms
(6 rows)

Thanks!
Fz

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Pierre Frédéric Caillaud 2009-08-08 09:26:34 Re: PG-related ACM Article: "The Pathologies of Big Data"
Previous Message Bruce Momjian 2009-08-08 04:13:49 Re: [PERFORM] BUG #4919: CREATE USER command slows down system performance