Re: Seq scan on join, not on subselect? analyze this

From: Bryce Nesbitt <bryce2(at)obviously(dot)com>
To: sql pgsql <pgsql-sql(at)postgresql(dot)org>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: Seq scan on join, not on subselect? analyze this
Date: 2008-11-02 20:14:43
Message-ID: 490E0A33.3060403@obviously.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-sql

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta content="text/html;charset=UTF-8" http-equiv="Content-Type">
</head>
<body bgcolor="#ffffff" text="#000000">
Tom Lane wrote:<br>
<blockquote cite="mid:4529(dot)1225651101(at)sss(dot)pgh(dot)pa(dot)us" type="cite">
<pre wrap="">Why are the rowcount estimates so far off? Maybe you need to increase
the statistics target for this table.
regards, tom lane
</pre>
</blockquote>
Tom,<br>
How does one tell the rowcount is off in a query plan?  I've never
found a great reference on interpreting the query analyze output!<br>
<br>
Upping the stats target made <i>little</i> difference in the estimated
row count, but a <i>huge</i> difference in the query.  It also mattered<br>
only on the context_key column, not the other columns.  Why would that
be?<br>
<br>
<font size="-1"><tt><br>
lyell5=&gt; \pset format wrapped<br>
lyell5=&gt; select * from pg_stats where tablename='article_words';<br>
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br>
| schemaname |  tablename  | attname  | null_frac | avg_width |
n_distinct |  most_common_vals  | most_common_freqs | 
histogram_bounds   | correlation |<br>
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br>
| public     | article_wor | word_key |         0 |         4 |      
6361 | {382,226,1155,2130 | {0.003,0.00233333 | {3,623,1376,2074,31 |  
0.0292482 |<br>
|            ;
ds                                                          ;
,2246,3328,8999,73 ; ,0.00233333,0.002 ;
54,4601,7269,12770,               |<br>
|                                                                         
; 3,1760,1796}       ; 33333,0.00233333, ;
30017,212496,582068               |<br>
|                                                                                              
; 0.00233333,0.0023 ; 0}                                |<br>
|                                                                                              
; 3333,0.002,0.002,                                     |<br>
|                                                                                              
; 0.002}                                                |<br>
| public     | article_wor | context_ |         0 |         4 |      
4671 | {639302,113013,133 | {0.00233333,0.001 | {28,42838,92697,140
|    0.867505 |<br>
|            ; ds          ;
key                                           ; 052,211978,508496, ;
66667,0.00166667, ; 684,202950,248442,3               |<br>
|                                                                         
; 545123,590829,5985 ; 0.00166667,0.0016 ;
38954,403025,498952               |<br>
|                                                                         
; 95,649645,37658}   ; 6667,0.00166667,0 ;
,584048,654070}                   |<br>
|                                                                                              
; .00166667,0.00166                                     |<br>
|                                                                                              
; 667,0.00166667,0.                                     |<br>
|                                                                                              
; 00133333}                                             |<br>
+------------+-------------+----------+-----------+-----------+------------+--------------------+-------------------+---------------------+-------------+<br>
lyell5=&gt; analyze verbose article_words;<br>
INFO:  "article_words": scanned 3000 of 1125489 pages, containing
555000 live rows<br>
and 0 dead rows; 3000 rows in sample, 208,215,465 estimated total rows<br>
<br>
lyell5=&gt; set default_statistics_target to 500;<br>
lyell5=&gt; analyze verbose article_words;<br>
INFO:  "article_words": scanned 150000 of 1125489 pages, containing
27749820 live rows<br>
and 180 dead rows; 150000 rows in sample, 208,214,114 estimated total
rows<br>
<br>
<br>
<br>
But it makes a huge impact on this query:<br>
<br>
</tt></font>
<table border="1" cellpadding="2" cellspacing="2" width="30%">
<tbody>
<tr>
<td bgcolor="#cccccc" valign="top">stats target<font
color="#333333"><br>
</font></td>
<td bgcolor="#cccccc" valign="top">sampled<br>
</td>
<td bgcolor="#cccccc" valign="top">query duration after analyze<br>
</td>
</tr>
<tr>
<td valign="top">10<br>
</td>
<td valign="top">3000<br>
</td>
<td valign="top">80600ms<br>
</td>
</tr>
<tr>
<td valign="top">15<br>
</td>
<td valign="top">4500<br>
</td>
<td valign="top">64000ms</td>
</tr>
<tr>
<td valign="top">20<br>
</td>
<td valign="top">6000<br>
</td>
<td valign="top">4.2ms<br>
</td>
</tr>
<tr>
<td valign="top">30<br>
</td>
<td valign="top">9000<br>
</td>
<td valign="top">4.2ms<br>
</td>
</tr>
<tr>
<td valign="top">250<br>
</td>
<td valign="top">75000<br>
</td>
<td valign="top">4.2ms<br>
</td>
</tr>
<tr>
<td valign="top">500<br>
</td>
<td valign="top">?<br>
</td>
<td valign="top">4.2ms<br>
</td>
</tr>
<tr>
<td valign="top">1000<br>
</td>
<td valign="top">300000<br>
</td>
<td valign="top">4.2ms<br>
</td>
</tr>
</tbody>
</table>
<font size="-1"><tt><br>
Is there any good reason not to set stats target to 1000 always?<br>
<br>
<br>
<br>
<br>
lyell5=&gt; alter table article_words alter word_key set statistics
30;analyze verbose article_words;<br>
explain analyze select words.* from article_words join words using
(word_key) where context_key=535462;<br>
INFO:  "article_words": scanned 9000 of 1125489 pages, containing
1665000 live rows and 0 dead rows; 9000 rows in sample, 208215465
estimated total rows<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
|                                                                  
QUERY
PLAN                                                                   |<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
| Nested Loop  (cost=0.00..160395.31 rows=16170 width=13) (actual
time=0.034..4.025 rows=777
loops=1)                                            |<br>
|   -&gt;  Index Scan using article_word_idx on article_words 
(cost=0.00..18258.96 rows=16170 width=4) (actual time=0.023..0.344
rows=777 loops=1) |<br>
|         Index Cond: (context_key =
535462)                                                                                                    
|<br>
|   -&gt;  Index Scan using words_pkey on words  (cost=0.00..8.78
rows=1 width=13) (actual time=0.003..0.004 rows=1
loops=777)                      |<br>
|         Index Cond: (words.word_key =
article_words.word_key)                                                                                 
|<br>
| Total runtime: 4.231
ms                                                                                                                       
|<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
<br>
<br>
lyell5=&gt; alter table article_words alter word_key set statistics
10;analyze verbose article_words;<br>
explain analyze select words.* from article_words join words using
(word_key) where context_key=535462;<br>
INFO:  "article_words": scanned 3000 of 1125489 pages, containing
555000 live rows and 0 dead rows; 3000 rows in sample, 208215465
estimated total rows<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
|                                                                  
QUERY
PLAN                                                                   |<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
| Hash Join  (cost=192089.54..265653.38 rows=44589 width=13) (actual
time=18809.184..80685.239 rows=777
loops=1)                                 |<br>
|   Hash Cond: (article_words.word_key =
words.word_key)                                                                                        
|<br>
|   -&gt;  Index Scan using article_word_idx on article_words 
(cost=0.00..44717.05 rows=44589 width=4) (actual time=0.025..0.597
rows=777 loops=1) |<br>
|         Index Cond: (context_key =
535462)                                                                                                    
|<br>
|   -&gt;  Hash  (cost=93818.13..93818.13 rows=5653313 width=13)
(actual time=18808.872..18808.872 rows=5651551
loops=1)                            |<br>
|         -&gt;  Seq Scan on words  (cost=0.00..93818.13 rows=5653313
width=13) (actual time=0.007..7845.824 rows=5651551
loops=1)                  |<br>
| Total runtime: 80686.217
ms                                                                                                                   
|<br>
+------------------------------------------------------------------------------------------------------------------------------------------------+<br>
<br>
</tt></font>
</body>
</html>

Attachment Content-Type Size
unknown_filename text/html 11.8 KB

In response to

Browse pgsql-sql by date

  From Date Subject
Next Message Sebastian Böhm 2008-11-03 09:29:55 reliable lock inside stored procedure
Previous Message Tom Lane 2008-11-02 18:38:21 Re: Seq scan on join, not on subselect? analyze this