Re: Increasing statistics results in worse estimates

From: Shelby Cain <alyandon(at)yahoo(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general(at)postgresql(dot)org
Subject: Re: Increasing statistics results in worse estimates
Date: 2005-04-29 16:54:11
Message-ID: 20050429165411.47188.qmail@web50101.mail.yahoo.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers-win32


--- Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Hm. I should have realized why correlation wouldn't
> be high for the
> city name: given the ordering by zipcode, city name
> values may be
> pretty well clumped, but they aren't in any kind of
> alphabetical
> order --- and it's the overall ordering, not the
> clumping, that
> correlation measures.
>

Ah. Localized clumping != Overall ordering. Thanks
for the clarification.

> However, there is something absolutely wacko about
> the stats collection
> process here ... you've got fairly reasonable
> looking results for
> most-common-values of city name at the lower end of
> the stats settings
> (HOUSTON and DALLAS are the most common, sounds
> about right) ... but at
> the higher settings the ordering of most-common
> entries just goes nuts.
> We've got some kind of bug there.
>

I had noticed that as well but wasn't sure about the
whether MCV really meant what I thought it did.

> What exactly are you changing in the different cases
> ---
> default_statistics_target, or are you doing an ALTER
> TABLE on some
> of the columns (if so which)?

I have a setting of 30 for default_statistics_target
and I am manipulating the statistics target for city
by alter table.

>
> It might be easier to debug this if you could send
> me the test case.
> Any problem with sending just the city name and
> zipcode columns
> of the table (offlist of course)? COPY TO with a
> column list can
> extract that for you.
>

I had already removed proprietary data to try and
whittle down the number of columns I needed to
demonstrate the weirdness so I can host a dump of the
table. However, before I take that step I should
mention that this is the native Windows port so if
that changes anything let me know.

Regards,

Shelby Cain

__________________________________________________
Do You Yahoo!?
Tired of spam? Yahoo! Mail has the best spam protection around
http://mail.yahoo.com

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Michael Fuhr 2005-04-29 16:58:08 Re: out of memory for query result
Previous Message James Robinson 2005-04-29 16:43:32 Re: Composite types as columns used in production?

Browse pgsql-hackers-win32 by date

  From Date Subject
Next Message Tom Lane 2005-04-29 17:26:04 Re: Increasing statistics results in worse estimates
Previous Message Mark Miller 2005-04-28 21:05:04 ERROR: Could not find function