Quick Links

Re: Cross-column statistics revisited

From:	Ron Mayer <rm_pg(at)cheapcomplexdevices(dot)com>
To:	Robert Haas <robertmhaas(at)gmail(dot)com>
Cc:	Martijn van Oosterhout <kleptog(at)svana(dot)org>, Joshua Tolley <eggyknap(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cross-column statistics revisited
Date:	2008-10-16 20:35:25
Message-ID:	48F7A58D.2090303@cheapcomplexdevices.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Robert Haas wrote:
>> I think the real question is: what other kinds of correlation might
>> people be interested in representing?
>
> Yes, or to phrase that another way: What kinds of queries are being
> poorly optimized now and why?

The one that affects our largest tables are ones where we
have an address (or other geo-data) clustered by zip, but
with other columns (city, county, state, school-zone, police
beat, etc) used in queries.

Postgres considers those unclustered (correlation 0 in the stats),
despite all rows for a given value residing on the same few pages.

I could imagine that this could be handled by either some cross-column
correlation (each zip has only 1-2 cities); or by an enhanced
single-column statistic (even though cities aren't sorted alphabetically,
all rows on a page tend to refer to the same city).

In response to

Re: Cross-column statistics revisited at 2008-10-16 17:34:59 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Josh Berkus	2008-10-16 20:54:57	Re: Cross-column statistics revisited
Previous Message	Simon Riggs	2008-10-16 20:00:41	Re: Deriving Recovery Snapshots