Re: Stats for multi-column indexes

From: Jeff Davis <pgsql(at)j-davis(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Stats for multi-column indexes
Date: 2007-03-20 01:55:56
Message-ID: 1174355756.23455.470.camel@dogma.v10.wvs
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2007-03-19 at 21:24 -0400, Tom Lane wrote:
> Jeff Davis <pgsql(at)j-davis(dot)com> writes:
> > We can already keep stats for a functional index. Is there a reason we
> > can't keep stats for a multi-column index?
>
> The questions that need to be answered are (1) what stats are you gonna
> collect, and (2) exactly what are you going to do with them when you
> have 'em?
>
> All the previous discussions have stalled on the question of how to
> avoid trying to collect stats about an exponentially large number of
> column combinations; we've never even reached the question of what
> stats we'd actually want given that a particular combination has been
> determined to be interesting. Perhaps that's a trivial question,
> but it's been a mighty long time since I took statistics ...
>

I know we can't keep stats on every combination of columns. My initial
idea would be to only keep stats about a multi-column index (and
probably optional for those, too).

My thinking was that we could keep a histogram (and MCVs, etc.) of the
non-scalar key in the multi-column index. That would provide the data
the planner needs to answer a query like "WHERE a = 1 and b < 1000" if a
and b are dependent and you have an index on (a,b).

It seemed within reach to me initially because I could use a functional
index (in which the function turns multiple values into a comparable
scalar) and postgresql would index that and keep stats. And when it has
those stats, it makes the correct plan. Of course, I have to litter the
SQL with unnecessary function calls (so that it can use the functional
index), which makes this undesirable.

AndrewSN pointed out on IRC that keeping a histogram of non-scalar
values is not as easy as I thought, because PostgreSQL doesn't allow
arrays of composite types, among other problems.

Is this a worthwhile area of exploration?

Regards,
Jeff Davis

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Kirkwood 2007-03-20 02:14:29 Re: Stats for multi-column indexes
Previous Message Andrew Dunstan 2007-03-20 01:35:36 Re: buildfarm not reporting xml flag correctly?