Re: pgsql: When estimating the selectivity of an inequality "column >

From: Greg Stark <gsstark(at)mit(dot)edu>
To: Tom Lane <tgl(at)postgresql(dot)org>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: pgsql: When estimating the selectivity of an inequality "column >
Date: 2010-01-04 13:29:52
Message-ID: 407d949e1001040529k9ec48a8vd0b6d54c33001f06@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-committers pgsql-hackers

On Mon, Jan 4, 2010 at 2:44 AM, Tom Lane <tgl(at)postgresql(dot)org> wrote:
> Log Message:
> -----------
> When estimating the selectivity of an inequality "column > constant" or
> "column < constant", and the comparison value is in the first or last
> histogram bin or outside the histogram entirely, try to fetch the actual
> column min or max value using an index scan (if there is an index on the
> column).  If successful, replace the lower or upper histogram bound with
> that value before carrying on with the estimate.  This limits the
> estimation error caused by moving min/max values when the comparison
> value is close to the min or max.  Per a complaint from Josh Berkus.

So.... This is pretty cool but it's worth discussing the downsides.
This is the second bit of live information the planning process is
peeking at to supplement the statistics (the first is the physical
size of the table). There are two use cases that I see being
negatively impacted by these features, neither of which we support
currently but I expect we'll eventually support and when we do we'll
have to work around these problems. The two problematic use cases I
see are: plan stability and exporting statistics to a another machine
to reproduce behaviour on a test machine.

I'm not sure what approach we'll have to take to work around these
problems. Perhaps we just need a way to disable checking these bits of
information and have a fallback strategy? perhaps we need a hook for
each of these fetches so you can store the size and upper bound of the
table on the production system you're trying to replicate the
behaviour of and a hook which substitutes these values for the dynamic
value? Perhaps the hook could even use a dbilink connection to get the
live production values.

I also wonder if we want to run these lookups on every single planner
invocation. If the table is being rarely updated analyze will never
fire and the stats value will never be updated. I don't think we want
the planner doing write operations to the stats either though. It
would be nice if there was a low-cost mode of operation for ANALYZE
which simulates precisely this update, the planner could send a stats
message saying that autovacuum should schedule one of these operations
sometime even if it doesn't see a need for a full analyze run.

--
greg

In response to

Responses

Browse pgsql-committers by date

  From Date Subject
Next Message Alvaro Herrera 2010-01-04 14:50:21 Re: pgsql: When estimating the selectivity of an inequality "column >
Previous Message Heikki Linnakangas 2010-01-04 12:50:50 pgsql: Write an end-of-backup WAL record at pg_stop_backup(), and wait

Browse pgsql-hackers by date

  From Date Subject
Next Message Mark Cave-Ayland 2010-01-04 13:51:15 Re: Add subdirectory support for DATA/DOCS with PGXS
Previous Message Pavel Stehule 2010-01-04 12:56:41 Re: quoting psql varible as identifier