Re: [osdldbt-general] Re: [GENERAL] how to get accurate

From: Christopher Browne <cbbrowne(at)acm(dot)org>
To: pgsql-performance(at)postgresql(dot)org
Subject: Re: [osdldbt-general] Re: [GENERAL] how to get accurate
Date: 2003-09-11 03:07:12
Message-ID: m3u17kyqe7.fsf@chvatal.cbbrowne.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

The world rejoiced as maryedie(at)osdl(dot)org (Mary Edie Meredith) wrote:
> We tried 1000 as the default and found that the plans were good
> plans and were consistent, but the pg_statistics was not exactly the
> same.
>
> We took Tom's' advice and tried SET SEED=0 (actually select setseed
> (0) ).

When you're trying to get strict replicability of results, setting the
seed to some specific value is necessary.

Some useful results could be attained by varying the seed, and seeing
how the plans change.

> We did runs last night on our project machine which produced
> consistent pg_statistics data and (of course) the same plans.

> We will next try runs where we vary the default buckets. Other than
> 10 and 1000, what numbers would you like us to try besides.
> Previously the number 100 was mentioned. Are there others?

That presumably depends on what your goal is.

A useful experiment would be to see at what point (e.g. - at what
bucket size) plans tend to "settle down" to the right values.

It might well be that defaulting to 23 buckets (I'm picking that out
of thin air) would cause the plans to typically be stable whatever
seed got used.

A test for this would be to, for each bucket size value, repeatedly
ANALYZE and check query plans.

At bucket size 10, you have seen the query plans vary quite a bit.

At 1000, they seem to stabilize very well.

The geometric centre, between 10 and 1000, is 100, so it would surely
be useful to see if query plans are stable at that bucket size.

The most interesting number to know would be the lowest number of
buckets at which query plans are nearly always stable. Supposing that
number was 23 (the number I earlier pulled out of the air), then that
can be used as evidence that the default value for SET STATISTICS
should be changed from 10 to 23.
--
wm(X,Y):-write(X),write('@'),write(Y). wm('aa454','freenet.carleton.ca').
http://www3.sympatico.ca/cbbrowne/nonrdbms.html
Sturgeon's Law: 90% of *EVERYTHING* is crud.

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Tom Lane 2003-09-11 04:30:59 Re: [GENERAL] how to get accurate values in pg_statistic (continued)
Previous Message Josh Berkus 2003-09-11 02:50:08 Re: Reading data in bulk - help?