Thoughts on statistics for continuously advancing columns

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Nathan Boley <npboley(at)gmail(dot)com>
Subject: Thoughts on statistics for continuously advancing columns
Date: 2009-12-30 02:52:27
Message-ID: 4B3AC06B.8090004@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

All,

One of our clients is having query plan issues with a table with a
continuously advancing timestamp column (i.e. one with default now()).
The newest rows, which are the most in demand, are always estimated to
be fewer than they are or even non-existant. As a result, the user has
to analyze the table every hour ... and it's a very large table.

I've seen this in a lot of other databases, both with timestamp columns
and with SERIALs -- both of which are very common table structures.
>From my reading of the planner code, the problem seems to be the
histgram bounds ... if a requested value is above the high bound, it's
assumed to be extremely uncommon or not exist. This leads to bad plans
if analyze hasn't been run very recently.

My thoughts on dealing with this intelligently without a major change to
statstics gathering went along these lines:

1. add columns to pg_statistic to hold estimates of upper and lower
bounds growth between analyzes.

2. every time analyze is run, populate these columns with 1/2 of the
proprotion of values above or below the previously stored bounds,
averaged with the existing value for the new columns.

3. use this factor instead of the existing algorithm to calculate the
row estimate for out-of-bounds values.

This is obviously a very rough idea, but I wanted to get feedback on the
general problem and my approach before going further with it.

Thanks!

--Josh Berkus

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Farina 2009-12-30 02:56:16 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION
Previous Message Robert Haas 2009-12-30 02:48:11 Re: [PATCH 4/4] Add tests to dblink covering use of COPY TO FUNCTION