Re: self-tuning histograms

From: Gavin Sherry <swm(at)linuxworld(dot)com(dot)au>
To: Neil Conway <nconway(at)klamath(dot)dyndns(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: self-tuning histograms
Date: 2002-05-30 03:52:08
Message-ID: Pine.LNX.4.21.0205301322500.31132-100000@linuxworld.com.au
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Neil,

I've also been thinking about this but haven't had time to collect my
thoughts.

On Wed, 29 May 2002, Neil Conway wrote:

> Histogram refinement can take place in two possible ways: online
> (as queries are executed, the histograms are updated immediately),
> or offline (the necessary data is written to a log after every
> query, which is processed on a regular basis to refine the
> histograms).

I would have thought that offline would have meant that the histogram
refinement could be run at the DBA's leisure.

> There are some disadvantages, however:
>
> (1) Reproduceability: At the moment, the system's performance
> only changes when the data is changed, or the DBA makes a
> configuration change. With this (and other "self-tuning"
> techniques, which are becoming very popular among
> commercial databases), the system can change the state of
> the system without the intervention of the DBA. While I'd
> hope that those changes are for the better (i.e. histograms
> eventually converging toward "perfect" accuracy), that
> won't always be the case. I don't really see a way around
> this, other than letting the DBA disable ST histograms
> when debugging problems.

Self-tuning would have to be optional.

> (2) Performance: As Aboulnaga and Shaudhuri point out, online
> histogram refinement can become a point of contention.
> Obviously, we want to avoid that. I think online refinement
> is still possible as long as we:
>
> (a) don't block waiting for locks: try to acquire the
> necessary locks to refine the histograms,
> immediately give up if not possible
>
> (b) delay histogram refinement so it doesn't interfere
> with the user: for example, store histogram data
> locally and only update the system catalogs when
> the backend is idle

This should be fine as long as the refinement system works through MVCC.

There is another consideration. If a database is using histogram
refinement then the 'base' data it works on must be accurate. If not,
refinement would compound the inaccuracy of the histogram. As such,
ANALYZE would have to scan the whole table (if/when run), COPY would have
to update the statistics, etc.

Gavin

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2002-05-30 03:58:23 Re: ipv6
Previous Message Neil Conway 2002-05-30 03:05:18 self-tuning histograms