Nested query performance issue

From: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Nested query performance issue
Date: 2009-04-08 21:09:24
Message-ID: d18085b50904081409xcd4dc21rbc5ea6f71a61dd1a@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

(This is related to an earlier post on -sql.)

I'm querying for the N high scores for each game, with two tables:
scores and games.

CREATE TABLE game (id SERIAL NOT NULL PRIMARY KEY);
CREATE TABLE score (id SERIAL NOT NULL PRIMARY KEY, score REAL,
game_id INTEGER REFERENCES game (id));
-- test data: 1000 games, 100000 scores
INSERT INTO game (id) select generate_series(1,1000);
INSERT INTO score (game_id, score) select game.id, random() from game,
generate_series(1,100);
CREATE INDEX score_idx1 ON score (game_id, score desc);
ANALYZE;

This query retrieves the single highest score for each game, but
doesn't allow any more than that--I can't get the top five scores for
each game. However, it's very fast: on the above test data, it runs
in 25ms on my system. With 1000000 scores, it takes 40ms.

SELECT s.* FROM score s
WHERE s.id IN (
-- Get the high scoring score ID for each game:
SELECT
(
-- Get the high score for game g:
SELECT s2.id FROM score s2 WHERE s2.game_id = g.id ORDER BY
s2.score DESC LIMIT 1
)
FROM game g
);

This rewrite allows getting the top N scores. Unfortunately, this one
takes 950ms for the same data. With 1000000 scores, it takes 14800ms.

SELECT s.* FROM score s, game g
WHERE s.game_id = g.id AND
s.id IN (
SELECT s2.id FROM score s2 WHERE s2.game_id=g.id ORDER BY s2.score
DESC LIMIT 1
);

This seems simple: for each game, search for the highest score, and
then scan the tree to get the next N-1 highest scores. The first
version does just that, but the second one is doing a seq scan over
score.

I do want to be able to use a LIMIT higher than 1, which only works
with the second form. Any suggestions of how to get the efficient
scanning of the first while being able to use a LIMIT greater than 1?

(It'd even be faster to make several calls to the first version,
varying an OFFSET to get each high score--but that's terrible.)

--
Glenn Maynard

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Віталій Тимчишин 2009-04-08 21:30:48 Re: Nested query performance issue
Previous Message Dimitri Fontaine 2009-04-08 20:46:47 Re: Best replication solution?