Re: SELECT LIMIT 1 VIEW Performance Issue

From: K C Lau <kclau60(at)netvigator(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: SELECT LIMIT 1 VIEW Performance Issue
Date: 2005-09-26 07:46:30
Message-ID: 6.2.1.2.0.20050926152048.0542bcc8@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

At 20:17 05/09/23, K C Lau wrote:
>At 19:15 05/09/23, Simon Riggs wrote:
>>select distinct on (PlayerID) PlayerID,AtDate from Player a
>>where PlayerID='22220' order by PlayerId, AtDate Desc;
>>
>>Does that work for you?
>>
>>Best Regards, Simon Riggs
>
>esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
>Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc;
> Unique (cost=0.00..1327.44 rows=2 width=23) (actual time=0.027..8.438
> rows=1 loops=1)
> -> Index Scan Backward using pk_player on player
> a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..4.950
> rows=1743 loops=1)
> Index Cond: ((playerid)::text = '22220'::text)
> Total runtime: 8.499 ms
>
>That is the fastest of all queries looping the 1743 rows.
>I do get the desired result by adding LIMIT 1:
>
>esdt=> explain analyze select distinct on (PlayerID) PlayerID,AtDate from
>Player a where PlayerID='22220' order by PlayerId Desc, AtDate Desc LIMIT 1;
>
> Limit (cost=0.00..663.72 rows=1 width=23) (actual time=0.032..0.033
> rows=1 loops=1)
> -> Unique (cost=0.00..1327.44 rows=2 width=23) (actual
> time=0.028..0.028 rows=1 loops=1)
> -> Index Scan Backward using pk_player on player
> a (cost=0.00..1323.05 rows=1756 width=23) (actual time=0.022..0.022
> rows=1 loops=1)
> Index Cond: ((playerid)::text = '22220'::text)
> Total runtime: 0.094 ms
>
>However, when I use that within a function in a view, it is slow again:
>
>esdt=> create or replace function player_max_atdate (varchar(32)) returns
>varchar(32) as $$
>esdt$> select distinct on (PlayerID) AtDate from player where PlayerID=
>$1 order by PlayerID desc, AtDate desc limit 1;
>esdt$> $$ language sql immutable;
>CREATE FUNCTION
>esdt=> create or replace view VCurPlayer3 as select * from Player where
>AtDate = player_max_atdate(PlayerID);
>CREATE VIEW
>esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
>PlayerID='22220';
>
> Index Scan using pk_player on player (cost=0.00..1331.83 rows=9
> width=23) (actual time=76.660..76.664 rows=1 loops=1)
> Index Cond: ((playerid)::text = '22220'::text)
> Filter: ((atdate)::text = (player_max_atdate(playerid))::text)
> Total runtime: 76.716 ms
>
>Why wouldn't the function get the row as quickly as the direct sql does?

Results from the following query suggests that the explain analyze output
above only tells half the story, and that the function is in fact called
1743 times:

esdt=> create or replace view VCurPlayer3 as select distinct on (PlayerID)
* from Player a where OID = (select distinct on (PlayerID) OID from Player
b where b.PlayerID = a.PlayerID and b.AtDate =
player_max_atdate(b.PlayerID) order by PlayerID desc, AtDate desc limit 1)
order by PlayerId Desc, AtDate desc;
CREATE VIEW
esdt=> explain analyze select PlayerID,AtDate from VCurPlayer3 where
PlayerID='22220';
Subquery Scan vcurplayer3 (cost=0.00..1715846.91 rows=1 width=68)
(actual time=0.640..119.124 rows=1 loops=1)
-> Unique (cost=0.00..1715846.90 rows=1 width=776) (actual
time=0.633..119.112 rows=1 loops=1)
-> Index Scan Backward using pk_player on player
a (cost=0.00..1715846.88 rows=9 width=776) (actual time=0.628..119.104
rows=1 loops=1)
Index Cond: ((playerid)::text = '22220'::text)
Filter: (oid = (subplan))
SubPlan
-> Limit (cost=0.00..976.38 rows=1 width=27) (actual
time=0.057..0.058 rows=1 loops=1743)
-> Unique (cost=0.00..976.38 rows=1 width=27)
(actual time=0.052..0.052 rows=1 loops=1743)
-> Index Scan Backward using pk_player on
player b (cost=0.00..976.36 rows=6 width=27) (actual time=0.047..0.047
rows=1 loops=1743)
Index Cond: ((playerid)::text = ($0)::text)
Filter: ((atdate)::text =
(player_max_atdate(playerid))::text)
Total runtime: 119.357 ms

It would also explain the very long time taken by the pl/pgsql function I
posted a bit earlier.

So I guess it all comes back to the basic question:

For the query select distinct on (PlayerID) * from Player a where
PlayerID='22220' order by PlayerId Desc, AtDate Desc;
can the optimizer recognise the fact the query is selecting by the primary
key (PlayerID,AtDate), so it can skip the remaining rows for that PlayerID,
as if LIMIT 1 is implied?

Best regards, KC.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Ron Peacetree 2005-09-26 11:01:59 Re: Advice on RAID card
Previous Message Chris Browne 2005-09-26 00:10:00 Re: Advice on RAID card