Cross column statistics

From: Greg Stark <gsstark(at)mit(dot)edu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Cross column statistics
Date: 2005-02-05 20:44:52
Message-ID: 87sm4aka7v.fsf@stark.xeocode.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Just brain storming a bit here. It seems to me there are two approaches for
cross-column statistics within a single table:

A) Treat an index on <a,b,c> the same way Postgres treats an expression index
on ROW(a,b,c). That is, gather full statistics on the distribution of that
ntuple of columns. I think this would be the easiest option for the
analyzer. But:

a) The optimizer would then need to do work to detect when all columns are
present in the constraints and deduce the ntuple to look for in the
statistics.

b) It would only help if all the columns are used. I'm not sure how easy it
would be to generalize this to queries using <a,b> or worse, <b,c>.

c) It would only work if you create an index on the set of columns being
queried. Often people have things like

SELECT * FROM tab WHERE indexed_col = ? AND deleted_flag IS false

where deleted_flag *isn't* indexed or is a where clause on a partial index.

B) gather a full matrix of the level of "correlation" between each column and
each other column. If this were a single floating point number per pair
then it might be feasible. It would still obviously be n^2 in the number of
columns though, so there would have to be some way to limit on how many
columns would be analyzed this way.

The problem is that's it's *very* unclear how to gather this information
using a sample in any remotely efficient manner. It's not even clear what
this number would be measuring.

It's not actually "correlation" that Postgres usually needs. It's "How many
distinct values of b do we expect to find given a=a_0". Or rather how many
do we expect to find relative to how many we would normally expect to find
if the columns were independent.

--
greg

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2005-02-05 22:05:07 Re: Fixing flat user/group files at database startup
Previous Message Stephen Frost 2005-02-05 20:26:22 Re: Fixing flat user/group files at database startup