Skip site navigation (1) Skip section navigation (2)

Odd Sort/Limit/Max Problem

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-performance(at)postgresql(dot)org
Subject: Odd Sort/Limit/Max Problem
Date: 2002-12-13 19:55:51
Message-ID: 200212131155.51985.josh@agliodbs.com (view raw or flat)
Thread:
Lists: pgsql-performance
Folks, 

Consider this performance quandry brought to me by Elein, which I can replcate 
in 7.2.3 and in 7.4 devel:

case_clients is a medium-large table with about 110,000 rows.  The field 
date_resolved is a timestamp field which is indexed and allows nulls (in 
fact, is null for 40% of entries).

First, as expected, a regular aggregate is slow:

jwnet=> explain analyze select max(date_resolved) from case_clients;
NOTICE:  QUERY PLAN:

Aggregate  (cost=3076.10..3076.10 rows=1 width=4) (actual time=484.24..484.24 
rows=1 loops=1)
  ->  Seq Scan on case_clients  (cost=0.00..2804.48 rows=108648 width=4) 
(actual time=0.08..379.81 rows=108648 loops=1)
Total runtime: 484.44 msec


So we use the workaround standard for PostgreSQL:

jwnet=> explain analyze select date_resolved from case_clients order by 
date_resolved desc limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..1.50 rows=1 width=4) (actual time=0.22..0.23 rows=1 
loops=1)
  ->  Index Scan Backward using idx_caseclients_resolved on case_clients  
(cost=0.00..163420.59 rows=108648 width=4) (actual time=0.21..0.22 rows=2 
loops=1)
Total runtime: 0.33 msec

... which is fast, but returns NULL, since nulls sort to the bottom!  So we 
add IS NOT NULL:

jwnet=> explain analyze select date_resolved from case_clients where 
date_resolved is not null order by date_resolved desc limit 1;
NOTICE:  QUERY PLAN:

Limit  (cost=0.00..4.06 rows=1 width=4) (actual time=219.63..219.64 rows=1 
loops=1)
  ->  Index Scan Backward using idx_caseclients_resolved on case_clients  
(cost=0.00..163420.59 rows=40272 width=4) (actual time=219.62..219.62 rows=2 
loops=1)
Total runtime: 219.76 msec

Aieee!  Almost as slow as the aggregate!

Now, none of those times is huge on this test database, but on a larger 
database (> 1million rows) the performance problem is much worse.  For some 
reason, the backward index scan seems to have to transverse all of the NULLs 
before selecting a value.  I find this peculiar, as I was under the 
impression that NULLs were not indexed.

What's going on here?

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


Responses

pgsql-performance by date

Next:From: Stephan SzaboDate: 2002-12-13 20:10:20
Subject: Re: Odd Sort/Limit/Max Problem
Previous:From: Josh BerkusDate: 2002-12-13 18:18:40
Subject: Re: Capping CPU usage?

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group