Skip site navigation (1) Skip section navigation (2)

Re: Bogus startup cost for WindowAgg

From: Mladen Gogala <mladen(dot)gogala(at)vmsinfo(dot)com>
To: Ants Aasma <ants(dot)aasma(at)eesti(dot)ee>
Cc: "pgsql-performance(at)postgresql(dot)org" <pgsql-performance(at)postgresql(dot)org>
Subject: Re: Bogus startup cost for WindowAgg
Date: 2010-10-13 19:35:02
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-performance
Ants Aasma wrote:
> I hit an issue with window aggregate costing while experimenting with
> providing a count of the full match along side a limited result set.
> Seems that the window aggregate node doesn't take into account that it
> has to consume the whole input before outputting the first row. When
> this is combined with a limit, the resulting cost estimate is wildly
> underestimated, leading to suboptimal plans.
> Is this a known issue? I couldn't find anything referring to this on
> the mailing list or todo.
What is your histogram size? That's defined by the 
default_statistics_target in your postgresql.conf.
Check the column histograms like this:

    news=> select attname,array_length(most_common_vals,1)
    from pg_stats
    where tablename='moreover_documents_y2010m09';
           attname        | array_length
     document_id          |            
     dre_reference        |            
     headline             |         1024
     author               |          212
     url                  |            
     rank                 |           59
     content              |         1024
     stories_like_this    |            
     internet_web_site_id |         1024
     harvest_time         |         1024
     valid_time           |         1024
     keyword              |           95
     article_id           |            
     media_type           |            5
     source_type          |            1
     created_at           |         1024
     autonomy_fed_at      |         1024
     language             |           37
    (18 rows)

    news=> show default_statistics_target;
    (1 row)

You will see that for most of the columns, the length of the histogram 
array corresponds to the value of the default_statistics_target 
parameter. For those that are smaller, the size is the total number of 
values in the column in the sample taken by the "analyze" command. The 
longer histogram, the better plan. In this case, the size does matter. 
Note that there are no histograms for the document_id and dre_reference 
columns. Those are the primary and unique keys, the optimizer can easily 
guess the distribution of values.

Mladen Gogala 
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251 
The Leader in Integrated Media Intelligence Solutions

In response to


pgsql-performance by date

Next:From: davidDate: 2010-10-13 19:37:45
Subject: Re: Slow count(*) again...
Previous:From: Reuven M. LernerDate: 2010-10-13 19:34:54
Subject: Re: SQL functions vs. PL/PgSQL functions

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group