Re: Cluster vs. non-cluster query planning

From: "Jim C(dot) Nasby" <jnasby(at)pervasive(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com>, pgsql-performance(at)postgresql(dot)org
Subject: Re: Cluster vs. non-cluster query planning
Date: 2006-05-02 21:29:42
Message-ID: 20060502212942.GE97354@pervasive.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Mon, May 01, 2006 at 07:35:02PM -0400, Tom Lane wrote:
> Nolan Cafferky <Nolan(dot)Cafferky(at)rbsinteractive(dot)com> writes:
> > But, I'm guessing that random_page_cost = 1 is not a realistic value.
>
> Well, that depends. If all your data can be expected to fit in memory
> then it is a realistic value. (If not, you should be real careful not
> to make performance decisions on the basis of test cases that *do* fit
> in RAM...)
>
> In any case, if I recall your numbers correctly you shouldn't need to
> drop it nearly that far to get the thing to make the right choice.
> A lot of people run with random_page_cost set to 2 or so.

Also, the index scan cost estimator comments indicate that it does a
linear interpolation between the entimated cost for a perfectly
correlated table and a table with 0 correlation, but in fact the
interpolation is exponential, or it's linear based on the *square* of
the correlation, which just doesn't make much sense.

I did some investigating on this some time ago, but never got very far
with it. http://stats.distributed.net/~decibel/summary.txt has some
info, and http://stats.distributed.net/~decibel/ has the raw data.
Graphing that data, if you only include correlations between 0.36 and
0.5, it appears that there is a linear correlation between correlation
and index scan time.

Of course this is very coarse data and it'd be great if someone did more
research in this area, preferably using pg_bench or other tools to
generate the data so that others can test this stuff as well. But even
with as rough as this data is, it seems to provide a decent indication
that it would be better to actually interpolate linearly based on
correlation, rather than correlation^2. This is a production machine so
I'd rather not go mucking about with testing such a change here.
--
Jim C. Nasby, Sr. Engineering Consultant jnasby(at)pervasive(dot)com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Jim C. Nasby 2006-05-02 21:38:47 Re: Postgres 7.4 and vacuum_cost_delay.
Previous Message Tony Wasson 2006-05-02 20:50:43 Re: postgresql transaction id monitoring with nagios