Re: Question about difference in performance of 2 queries

From: Nicholas Shanny <nshanny(at)tripadvisor(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Sean Shanny <shannyconsulting(at)earthlink(dot)net>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Question about difference in performance of 2 queries
Date: 2003-12-30 15:00:30
Message-ID: E8CA1CB3-3AD8-11D8-BE83-000A95C41B96@tripadvisor.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Tom,

I understand the problem and your solution makes sense although I am
still puzzled by the machine under-utilization. If you run the original
query and monitor the IO/CPU usage you find that it is minimal.

here is the output from iostat 1 for a brief portion of the query. I am
very curious to understand why when scanning the index the IO/CPU
utilization is seemingly low.

Cheers
Nick Shanny
TripAdvisor, Inc.

0 77 32.00 106 3.31 0.00 0 0.00 0.00 0 0.00 0 0 2 0
98
0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 2
0 97
0 76 32.00 125 3.90 0.00 0 0.00 0.00 0 0.00 0 0 1
1 98
0 76 32.75 127 4.05 0.00 0 0.00 0.00 0 0.00 0 0 1
0 99
tty aacd0 acd0 fd0
cpu
tin tout KB/t tps MB/s KB/t tps MB/s KB/t tps MB/s us ni sy
in id
0 76 32.00 127 3.96 0.00 0 0.00 0.00 0 0.00 0 0 3
0 97
0 229 32.24 135 4.24 0.00 0 0.00 0.00 0 0.00 0 0 4
0 95
0 76 32.00 129 4.02 0.00 0 0.00 0.00 0 0.00 0 0 2
0 97
0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 2
0 98
0 76 31.72 115 3.56 0.00 0 0.00 0.00 0 0.00 0 0 2
0 98
0 76 32.50 126 3.99 0.00 0 0.00 0.00 0 0.00 0 0 3
1 96
0 76 32.00 123 3.84 0.00 0 0.00 0.00 0 0.00 0 0 3
0 97
0 76 32.00 122 3.81 0.00 0 0.00 0.00 0 0.00 1 0 2
0 97
0 76 32.00 135 4.21 0.00 0 0.00 0.00 0 0.00 0 0 2
1 97
0 76 32.00 97 3.03 0.00 0 0.00 0.00 0 0.00 0 0 3
0 97

On Dec 29, 2003, at 2:39 PM, Tom Lane wrote:

> Sean Shanny <shannyconsulting(at)earthlink(dot)net> writes:
>> Here is the pg_stats data. The explain analyze queries are still
>> running.
>
>> select * from pg_stats where tablename = 'f_pageviews' and attname =
>> 'content_key';
>> schemaname | tablename | attname | null_frac | avg_width |
>> n_distinct | most_common_vals | most_common_freqs
>> |
>> histogram_bounds | correlation
>> ------------+-------------+-------------+-----------+-----------
>> +------------+------------------+-----------------------
>> +---------------------------------------------------------------------
>> ----------------+-------------
>> public | f_pageviews | content_key | 0 | 4 |
>> 983 | {-1,1528483} | {0.749333,0.00166667} |
>
> Oh-ho, I see the problem: about 75% of your table has content_key = -1.
>
> Why is that a problem, you ask? Well, the planner realizes that
> "content_key > -1" is a pretty good restriction condition (better than
> the date condition, apparently) and so it tries to use that as the
> index
> scan condition. The problem is that in 7.4 and before, the btree index
> code implements a "> -1" scan starting boundary by finding the first -1
> and then advancing to the first key that's not -1. So you end up
> scanning through 75% of the index before anything useful happens :-(
>
> I just fixed this poor behavior in CVS tip a couple weeks ago:
> http://archives.postgresql.org/pgsql-committers/2003-12/msg00220.php
> but the patch seems too large and unproven to risk back-patching into
> 7.4.*.
>
> If you expect that a pretty large fraction of your data will always
> have
> dummy content_key, it'd probably be worth changing the index to not
> index -1's at all --- that is, make it a partial index with the
> condition "WHERE content_key > -1". Another workaround is to leave the
> index as-is but phrase the query WHERE condition as "content_key >= 0"
> instead of "> -1".
>
> regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jonathan Gardner 2003-12-30 16:32:38 DELETE ... WHERE ctid IN (...) vs. Iteration
Previous Message Vivek Khera 2003-12-29 20:33:57 deferred foreign keys