Quick Links

Re: Cross-column statistics revisited

From:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
Cc:	Joshua Tolley <eggyknap(at)gmail(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Cross-column statistics revisited
Date:	2008-10-17 12:46:11
Message-ID:	14436.1224247571@sss.pgh.pa.us
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> Just a note: using a multidimensional histograms will work well for the
> cases like (startdate,enddate) where the histogram will show a
> clustering of values along the diagonal. But it will fail for the case
> (zipcode,state) where one implies the other. Histogram-wise you're not
> going to see any correlation at all

Huh? Sure you are. What the histogram will show is that there is only
one state value per zipcode, and only a limited subset of zipcodes per
state. The nonempty cells won't cluster along the "diagonal" but we
don't particularly care about that.

What we really want from this is to not think that
WHERE zip = '80210' AND state = 'CA'
is significantly more selective than just
WHERE zip = '80210'
A histogram is certainly capable of telling us that. Whether it's the
most compact representation is another question of course --- in an
example like this, only about 1/50th of the cells would contain nonzero
counts ...

regards, tom lane

In response to

Re: Cross-column statistics revisited at 2008-10-17 06:24:21 from Martijn van Oosterhout

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andrew Chernow	2008-10-17 14:12:34	Re: 8.3 .4 + Vista + MingW + initdb = ACCESS_DENIED
Previous Message	Richard Huxton	2008-10-17 11:17:49	Re: Cross-column statistics revisited