Skip site navigation (1) Skip section navigation (2)

Nested query performance issue

From: Glenn Maynard <glennfmaynard(at)gmail(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Nested query performance issue
Date: 2009-04-08 21:09:24
Message-ID: d18085b50904081409xcd4dc21rbc5ea6f71a61dd1a@mail.gmail.com (view raw or flat)
Thread:
Lists: pgsql-performance
(This is related to an earlier post on -sql.)

I'm querying for the N high scores for each game, with two tables:
scores and games.

CREATE TABLE game (id SERIAL NOT NULL PRIMARY KEY);
CREATE TABLE score (id SERIAL NOT NULL PRIMARY KEY, score REAL,
game_id INTEGER REFERENCES game (id));
-- test data: 1000 games, 100000 scores
INSERT INTO game (id) select generate_series(1,1000);
INSERT INTO score (game_id, score) select game.id, random() from game,
generate_series(1,100);
CREATE INDEX score_idx1 ON score (game_id, score desc);
ANALYZE;

This query retrieves the single highest score for each game, but
doesn't allow any more than that--I can't get the top five scores for
each game.  However, it's very fast: on the above test data, it runs
in 25ms on my system.  With 1000000 scores, it takes 40ms.

SELECT s.* FROM score s
WHERE s.id IN (
   -- Get the high scoring score ID for each game:
   SELECT
   (
       -- Get the high score for game g:
       SELECT s2.id FROM score s2 WHERE s2.game_id = g.id ORDER BY
s2.score DESC LIMIT 1
   )
   FROM game g
);


This rewrite allows getting the top N scores.  Unfortunately, this one
takes 950ms for the same data.  With 1000000 scores, it takes 14800ms.

SELECT s.* FROM score s, game g
WHERE s.game_id = g.id AND
 s.id IN (
    SELECT s2.id FROM score s2 WHERE s2.game_id=g.id ORDER BY s2.score
DESC LIMIT 1
 );


This seems simple: for each game, search for the highest score, and
then scan the tree to get the next N-1 highest scores.  The first
version does just that, but the second one is doing a seq scan over
score.

I do want to be able to use a LIMIT higher than 1, which only works
with the second form.  Any suggestions of how to get the efficient
scanning of the first while being able to use a LIMIT greater than 1?

(It'd even be faster to make several calls to the first version,
varying an OFFSET to get each high score--but that's terrible.)

-- 
Glenn Maynard

Responses

pgsql-performance by date

Next:From: Віталій ТимчишинDate: 2009-04-08 21:30:48
Subject: Re: Nested query performance issue
Previous:From: Dimitri FontaineDate: 2009-04-08 20:46:47
Subject: Re: Best replication solution?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group