Re: self-tuning histograms

From: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
To: "Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: self-tuning histograms
Date: 2002-05-30 04:42:10
Message-ID: 20020530004210.2adb6137.nconway@klamath.dyndns.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, 30 May 2002 13:52:08 +1000 (EST)
"Gavin Sherry" <swm(at)linuxworld(dot)com(dot)au> wrote:
> On Wed, 29 May 2002, Neil Conway wrote:
> > Histogram refinement can take place in two possible ways: online
> > (as queries are executed, the histograms are updated immediately),
> > or offline (the necessary data is written to a log after every
> > query, which is processed on a regular basis to refine the
> > histograms).
>
> I would have thought that offline would have meant that the histogram
> refinement could be run at the DBA's leisure.

Yeah -- that makes more sense.

> > (2) Performance: As Aboulnaga and Shaudhuri point out, online
> > histogram refinement can become a point of contention.
> > Obviously, we want to avoid that.
>
> This should be fine as long as the refinement system works through MVCC.

Good point -- the current pg_statistic routines are MVCC aware, so
there's no reason to change that. In that case, my concerns about
contention over histogram refinement may be unfounded. I still think we
should avoid redundant histogram refinements (i.e. don't update the
histograms on every single query), but I'm glad that MVCC solves most
of this problem for us.

> There is another consideration. If a database is using histogram
> refinement then the 'base' data it works on must be accurate. If not,
> refinement would compound the inaccuracy of the histogram.

Aboulnaga and Shaudhuri doesn't require this. The initial assumption
is that the attribute is uniformly distributed over the initial
buckets of the histogram. Naturally, this is incorrect: as queries
are executed, the initial histogram is modified by "refinement"
(the frequences of individual buckets are adjusted) and
"restructuring" (bucket boundaries are adjusted). For more
information (and the exact algorithms used), see sections 3.2
and 3.3 of the paper.

Cheers,

Neil

--
Neil Conway <neilconway(at)rogers(dot)com>
PGP Key ID: DB3C29FC

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Oleg Bartunov 2002-05-30 10:13:13 Re: Null values in indexes
Previous Message Tom Lane 2002-05-30 03:58:23 Re: ipv6