Quick Links

Re: Cross-column statistics revisited

From:	Martijn van Oosterhout <kleptog(at)svana(dot)org>
To:	Joshua Tolley <eggyknap(at)gmail(dot)com>
Cc:	PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: Cross-column statistics revisited
Date:	2008-10-16 17:11:26
Message-ID:	20081016171126.GB19967@svana.org
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 15, 2008 at 04:53:10AM -0600, Joshua Tolley wrote:
> I've been interested in what it would take to start tracking
> cross-column statistics. A review of the mailing lists as linked from
> the TODO item on the subject [1] suggests the following concerns:
>
> 1) What information exactly would be tracked?
> 2) How would it be kept from exploding in size?
> 3) For which combinations of columns would statistics be kept?

I think you need to go a step back: how are you going to use this data?
Whatever structure you choose the eventual goal you take a discription
of the column (a,b) and take a clause like 'a < 5' and be able to
generate an estimate of the distribution of b.

Secondly, people arn't going to ask for multi-column stats on column
that arn't correlated in some way. So you need to work out what kinds
of correlation people are interested in and see how you can store them.

One potential use case is the (startdate,enddate) columns. Here what
you want to detect somehow that the distribution of (enddate-startdate)
is constant.

I think the real question is: what other kinds of correlation might
people be interested in representing?

Have a nice day,
--
Martijn van Oosterhout <kleptog(at)svana(dot)org> http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

In response to

Cross-column statistics revisited at 2008-10-15 10:53:10 from Joshua Tolley

Responses

Re: Cross-column statistics revisited at 2008-10-16 17:20:30 from Tom Lane
Re: Cross-column statistics revisited at 2008-10-16 17:34:59 from Robert Haas

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2008-10-16 17:20:30	Re: Cross-column statistics revisited
Previous Message	Simon Riggs	2008-10-16 16:18:00	Re: Deriving Recovery Snapshots