Re: raising the default default_statistics_target

From: "scott(dot)marlowe" <scott(dot)marlowe(at)ihs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Neil Conway <neilc(at)samurai(dot)com>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: raising the default default_statistics_target
Date: 2004-03-09 17:21:41
Message-ID: Pine.LNX.4.33.0403091009280.6001-100000@css120.ihs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 7 Mar 2004, Tom Lane wrote:

> Neil Conway <neilc(at)samurai(dot)com> writes:
> > Tom Lane wrote:
> >> This is something we need to consider, but we'll need more evidence
> >> before making a choice. One thing that we have very little data about
> >> is how much difference it makes in the quality of planner choices.
>
> > Right, but is there a practical way to actually get this data?
>
> I haven't thought of one yet, but perhaps someone will have an idea.

Hi Tom. I ran some very simple tests on analyze times and query plan
times on a very simple table, with data randomly distributed. The index
was on a date field, since that's what I was testing last.

This was all done on my 512Meg memory 1.1GHz celeron workstation with an
IDE drive. I'd love more input on better testing methodologies here...

with 100k or 1M rows that look kinda like this: (I'll test 10M rows
later, which means the dataset won't fit in memory, so there'll be lots of
access going on. Right now the 1M row table is 80 meg)

select * from test2 limit 5;
info | dt | id
---------------------------------------------+---------------------+---------
Francize perfectible swirling fluctuates | 2004-05-20 20:12:04 | 2721995
Fields chauffeur attentionality grandmother | 2004-04-07 14:36:02 | 2721996
Belgium bilked explosively defendant | 2004-09-16 16:27:22 | 2721997
perspectives Buenos Pollux discriminates | 2004-11-11 12:28:31 | 2721998
Victorianize Savonarola blackmails sufficed | 2004-02-27 21:17:20 | 2721999
(5 rows)

here's what I get with different statistics targets for analyze times:

100k 1M 1M
analyze analyze plan
target ms ms ms
10 250 875 2
20 350 1250
30 430 1500
40 520 1725
50 580 1900
60 690 2100
70 775 2175
80 850 2300
90 950 2400
100 1000 2600 2.5
200 1806 3700
300 2600 4800
400 2600 5900
500 2600 7200
700 2600 9500
1000 2600 13000 5

Since this data is randomly distributed, I didn't bother doing a lot of
testing to see how accurate each target setting was. If that would be
useful to know I'd gladly test it, but I was only setting out to test the
time to analyze and the time to plan.

Note that I only tested 3 targets for planning time, as it didn't seem to
make a very big difference. The query was:

select * from test2 where dt between 'march 11, 2004' and 'march 13, 2004';

I also ran some quick tests on smaller tables (1000 and 10k rows) and
there, the plateau that we see in the 100k analyze shows up much quicker,
at something like 50 or so. I.e. the analyze time flattened out quickly
and higher numbers cost very little if anything.

Since this query was quite an easy plan, I'd expect to need a much more
complex one to test the increase in planning time, say something that has
to look at a lot of statistics. Any particular join type or something
that's likely to do that?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lee Kindness 2004-03-09 17:30:41 ECPG - Specifying connections, TSD, sqlca.
Previous Message Josh Berkus 2004-03-09 16:41:42 Re: raising the default default_statistics_target