Skip site navigation (1) Skip section navigation (2)

Re: Question about difference in performance of 2 queries

From: Nicholas Shanny <nshanny(at)tripadvisor(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sean Shanny <shannyconsulting(at)earthlink(dot)net>,pgsql-performance(at)postgresql(dot)org
Subject: Re: Question about difference in performance of 2 queries
Date: 2003-12-30 15:00:30
Message-ID: E8CA1CB3-3AD8-11D8-BE83-000A95C41B96@tripadvisor.com (view raw or flat)
Thread:
Lists: pgsql-performance
Tom,

I understand the problem and your solution makes sense although I am  
still puzzled by the machine under-utilization. If you run the original  
query and monitor the IO/CPU usage you find that it is minimal.

here is the output from iostat 1 for a brief portion of the query. I am  
very curious to understand why when scanning the index the IO/CPU  
utilization is seemingly low.

Cheers
Nick Shanny
TripAdvisor, Inc.

   0   77 32.00 106  3.31   0.00   0  0.00   0.00   0  0.00   0  0  2  0  
98
    0   76 32.00 125  3.90   0.00   0  0.00   0.00   0  0.00   0  0  2   
0 97
    0   76 32.00 125  3.90   0.00   0  0.00   0.00   0  0.00   0  0  1   
1 98
    0   76 32.75 127  4.05   0.00   0  0.00   0.00   0  0.00   0  0  1   
0 99
       tty           aacd0             acd0              fd0              
cpu
  tin tout  KB/t tps  MB/s   KB/t tps  MB/s   KB/t tps  MB/s  us ni sy  
in id
    0   76 32.00 127  3.96   0.00   0  0.00   0.00   0  0.00   0  0  3   
0 97
    0  229 32.24 135  4.24   0.00   0  0.00   0.00   0  0.00   0  0  4   
0 95
    0   76 32.00 129  4.02   0.00   0  0.00   0.00   0  0.00   0  0  2   
0 97
    0   76 32.00 123  3.84   0.00   0  0.00   0.00   0  0.00   0  0  2   
0 98
    0   76 31.72 115  3.56   0.00   0  0.00   0.00   0  0.00   0  0  2   
0 98
    0   76 32.50 126  3.99   0.00   0  0.00   0.00   0  0.00   0  0  3   
1 96
    0   76 32.00 123  3.84   0.00   0  0.00   0.00   0  0.00   0  0  3   
0 97
    0   76 32.00 122  3.81   0.00   0  0.00   0.00   0  0.00   1  0  2   
0 97
    0   76 32.00 135  4.21   0.00   0  0.00   0.00   0  0.00   0  0  2   
1 97
    0   76 32.00  97  3.03   0.00   0  0.00   0.00   0  0.00   0  0  3   
0 97

On Dec 29, 2003, at 2:39 PM, Tom Lane wrote:

> Sean Shanny <shannyconsulting(at)earthlink(dot)net> writes:
>> Here is the pg_stats data.  The explain analyze queries are still  
>> running.
>
>> select * from pg_stats where tablename = 'f_pageviews' and attname =
>> 'content_key';
>>  schemaname |  tablename  |   attname   | null_frac | avg_width |
>> n_distinct | most_common_vals |   most_common_freqs
>> |
>> histogram_bounds                                   | correlation
>> ------------+-------------+-------------+-----------+----------- 
>> +------------+------------------+----------------------- 
>> +--------------------------------------------------------------------- 
>> ----------------+-------------
>>  public     | f_pageviews | content_key |         0 |         4 |
>> 983 | {-1,1528483}     | {0.749333,0.00166667} |
>
> Oh-ho, I see the problem: about 75% of your table has content_key = -1.
>
> Why is that a problem, you ask?  Well, the planner realizes that
> "content_key > -1" is a pretty good restriction condition (better than
> the date condition, apparently) and so it tries to use that as the  
> index
> scan condition.  The problem is that in 7.4 and before, the btree index
> code implements a "> -1" scan starting boundary by finding the first -1
> and then advancing to the first key that's not -1.  So you end up
> scanning through 75% of the index before anything useful happens :-(
>
> I just fixed this poor behavior in CVS tip a couple weeks ago:
> http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php
> but the patch seems too large and unproven to risk back-patching into
> 7.4.*.
>
> If you expect that a pretty large fraction of your data will always  
> have
> dummy content_key, it'd probably be worth changing the index to not
> index -1's at all --- that is, make it a partial index with the
> condition "WHERE content_key > -1".  Another workaround is to leave the
> index as-is but phrase the query WHERE condition as "content_key >= 0"
> instead of "> -1".
>
> 			regards, tom lane
>
> ---------------------------(end of  
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


In response to

pgsql-performance by date

Next:From: Jonathan GardnerDate: 2003-12-30 16:32:38
Subject: DELETE ... WHERE ctid IN (...) vs. Iteration
Previous:From: Vivek KheraDate: 2003-12-29 20:33:57
Subject: deferred foreign keys

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group