From: | "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com> |
---|---|
To: | Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
Cc: | Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com>, pgsql-performance(at)postgresql(dot)org |
Subject: | Re: Cluster vs. non-cluster query planning |
Date: | 2006-05-02 21:29:42 |
Message-ID: | 20060502212942.GE97354@pervasive.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-performance |
On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote:
> Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com> writes:
> > But, I'm guessing that random_page_cost = 1 is not a realistic value.
>
> Well, that depends. If all your data can be expected to fit in memory
> then it is a realistic value. (If not, you should be real careful not
> to make performance decisions on the basis of test cases that *do* fit
> in RAM...)
>
> In any case, if I recall your numbers correctly you shouldn't need to
> drop it nearly that far to get the thing to make the right choice.
> A lot of people run with random_page_cost set to 2 or so.
Also, the index scan cost estimator comments indicate that it does a
linear interpolation between the entimated cost for a perfectly
correlated table and a table with 0 correlation, but in fact the
interpolation is exponential, or it's linear based on the *square* of
the correlation, which just doesn't make much sense.
I did some investigating on this some time ago, but never got very far
with it. http://stats.distributed.net/~decibel/summary.txt has some
info, and http://stats.distributed.net/~decibel/ has the raw data.
Graphing that data, if you only include correlations between 0.36 and
0.5, it appears that there is a linear correlation between correlation
and index scan time.
Of course this is very coarse data and it'd be great if someone did more
research in this area, preferably using pg_bench or other tools to
generate the data so that others can test this stuff as well. But even
with as rough as this data is, it seems to provide a decent indication
that it would be better to actually interpolate linearly based on
correlation, rather than correlation^2. This is a production machine so
I'd rather not go mucking about with testing such a change here.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
From | Date | Subject | |
---|---|---|---|
Next Message | Jim C. Nasby | 2006-05-02 21:38:47 | Re: Postgres 7.4 and vacuum_cost_delay. |
Previous Message | Tony Wasson | 2006-05-02 20:50:43 | Re: postgresql transaction id monitoring with nagios |