From: | Ron Mayer <ron(at)cheapcomplexdevices(dot)com> |
---|---|
To: | pgsql(at)mohawksoft(dot)com |
Cc: | Mark Kirkwood <markir(at)coretech(dot)co(dot)nz>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Query optimizer 8.0.1 (and 8.0) |
Date: | 2005-02-08 03:15:50 |
Message-ID: | 42082EE6.3070401@cheapcomplexdevices.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
pgsql(at)mohawksoft(dot)com wrote:
>
> In this case, the behavior observed could be changed by altering the
> sample size for a table. I submit that an arbitrary fixed sample size is
> not a good base for the analyzer, but that the sample size should be based
> on the size of the table or some calculation of its deviation.
>
Mark,
Do you have any evidence that the Sample Size had anything to do
with the performance problem you're seeing?
I also do a lot with the complete Census/TIGER database.
Every problem I have with the optimizer comes down to the
fact that the data is loaded (and ordered on disk) by
State/County FIPS codes, and then queried by zip-code
or by city name. Like this:
Alabama 36101 [hundreds of pages with zip's in 36***]
Alaska 99686 [hundreds of pages with zip's in 9****]
Arizona 85701 [hundreds of pages with zip's in 855**]
Note that the zip codes are *NOT* sequential.
The "correlation" statistic sees that the Zip codes are not
sequential; so it makes the *HORRIBLE* assumption that they
are scattered randomly across the disk.
In reality, even though there's no total ordering of the
zip codes; any given zip code only exists on a couple
disk pages; so index scans would be the right choice.
But the single correlation parameter is not sufficient
to let the optimizer known this.
No matter how large a sample size you choose, ANALYZE
will correctly see that Zip codes and State FIPS codes
are non-correlated, and the optimizer will overestimate
the # of pages an index scan will need.
Ron
PS: I pointed out workarounds in my earlier posting
in this thread. Yes, I'm using the same TIGER data
you are.
From | Date | Subject | |
---|---|---|---|
Next Message | Robby Russell | 2005-02-08 03:42:32 | Re: [GENERAL] PHP/PDO Database Abstraction Layer |
Previous Message | Greg Stark | 2005-02-08 02:02:54 | Re: Query optimizer 8.0.1 (and 8.0) |