Re: Worse perfomance on 8.2.0 than on 7.4.14

From: Rolf Østvik <rolfostvik(at)yahoo(dot)no>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Worse perfomance on 8.2.0 than on 7.4.14
Date: 2007-01-05 18:28:33
Message-ID: 20070105182833.94277.qmail@web26312.mail.ukl.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance


--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> skrev:

> The number-of-matching-rows estimate has gone up by a factor of 10,
> which undoubtedly has a lot to do with the much higher cost estimate.
> Do you have any idea why that is ... is the table really the same size
> in both servers? If so, could we see the pg_stats row for
> step_result_subset.uut_result on both servers?

Table step_result_subset and uut_result_subset in both databases is created from same schema
definition file and filled with data from the same data source file.

==== Server 7.4.14: ====

logistics_74# select count(*) from step_result_subset;
count
----------
17179506
(1 row)

logistics_74# select count(distinct uut_result) from step_result_subset;
count
--------
176450
(1 row)

logistics_74# analyse verbose step_result_subset;
INFO: analyzing "public.step_result_subset"
INFO: "step_result_subset": 92863 pages, 3000 rows sampled, 17179655 estimated total rows
ANALYZE

logistics_74# select * from pg_stats where tablename = step_result_subset and
attname='uut_result';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds | correlation
------------+--------------------+------------+-----------+-----------+------------+----------------------------------------------------------------+---------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
public | step_result_subset | uut_result | 0 | 4 | 57503 |
{70335,145211,17229,20091,21827,33338,34370,42426,47274,54146} |
{0.001,0.001,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667,0.000666667}
| {213,30974,51300,68529,85053,100838,114971,128126,144230,161657,176691} | 0.951364
(1 row)

==== Server 8.2.0: ====

logistics_82# select count(*) from step_result_subset;
count
----------
17179506
(1 row)

logistics_82# select count(distinct uut_result) from step_result_subset;
count
--------
176450
(1 row)

logistics_82# analyse verbose step_result_subset;
INFO: analyzing "public.step_result_subset"
INFO: "step_result_subset": scanned 3000 of 92863 pages, containing 555000 live rows and 0 dead
rows; 3000 rows in sample, 17179655 estimated total rows
ANALYZE

logistics_# select * from pg_stats where tablename = step_result_subset and attname='uut_result';
schemaname | tablename | attname | null_frac | avg_width | n_distinct |
most_common_vals |
most_common_freqs |
histogram_bounds | correlation
------------+--------------------+------------+-----------+-----------+------------+-----------------------------------------------------------------+-----------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------+-------------
public | step_result_subset | uut_result | 0 | 4 | 6516 |
{35010,111592,35790,41162,56844,57444,60709,73017,76295,106470} |
{0.00166667,0.00166667,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333,0.00133333}
| {147,31791,54286,70928,85996,102668,117885,130947,144766,162098,176685} | 0.954647
(1 row)

Then on server 8.2.0 i need to set statistics to ~120 on step_result_subset.uut_result to get
n_distinct to be in same range as n_distinct on 7.4.14.

Even with a statistics value of 1000, the n_distinct value does only reach ~138 000. Is it correct
that _ideally_ the n_distinct value should be the same as "select count(distinct uut_result) from
step_result_subset"?

====
Even with better statistics on step_result_subset.uut_result neither of 7.4.14 or 8.2.0 manages to
pick the best plan when i want to select bigger datasets (in my examples that would be to set an
earlier date in the where clause for "ur.start_date_time > '2006-12-11'"). I will continue to
adjust other parameters and see what i can manage myself.

Best regards
Rolf Østvik

__________________________________________________
Bruker du Yahoo!?
Lei av spam? Yahoo! Mail har den beste spambeskyttelsen
http://no.mail.yahoo.com

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Guy Rouillier 2007-01-06 02:51:18 Re: High update activity, PostgreSQL vs BigDBMS
Previous Message Reid Thompson 2007-01-05 18:15:44 Re: PostgreSQL to host e-mail?