Re: -HEAD planner issue wrt hash_joins on dbt3 ?

From: Mark Cave-Ayland <mark(dot)cave-ayland(at)ilande(dot)co(dot)uk>
To: Matteo Beccati <php(at)beccati(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stefan Kaltenbrunner <stefan(at)kaltenbrunner(dot)cc>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: -HEAD planner issue wrt hash_joins on dbt3 ?
Date: 2006-09-19 13:55:38
Message-ID: 1158674138.5800.15.camel@mca-desktop
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 2006-09-18 at 17:46 +0200, Matteo Beccati wrote:
> Tom Lane ha scritto:
> > Matteo Beccati <php(at)beccati(dot)com> writes:
> >> I cannot see anything bad by using something like that:
> >> if (histogram is large/representative enough)
> >
> > Well, the question is exactly what is "large enough"? I feel a bit
> > uncomfortable about applying the idea to a histogram with only 10
> > entries (especially if we ignore two of 'em). With 100 or more,
> > it sounds all right. What's the breakpoint?
>
> Yes, I think 100-200 could be a good breakpoint. I don't actually know
> what is the current usage of SET STATISTICS, I usually set it to 1000
> for columns which need more precise selectivity.
>
> The breakpoint could be set even higher (500?) so there is space to
> increase statistics without enabling the histogram check, but I don't
> feel very comfortable though suggesting this kind of possibly
> undocumented side effect...

Hi everyone,

You may be interested to have a look at the statistics collector for the
geometry type within PostGIS. In order to prevent very large or very
small geometries from ruining the statistics histogram and generating
incorrect query plans, we make the assumption that the column
distribution is likely to be close to normal, and then remove any
ANALYZE-collected geometries from the set that lie outside +/- 3.25
standard deviations from the mean before creating the final histogram
(removes just under 1% of the data from each end of an assumed normal
distribution). This works well and AFAIK we've only ever had one
reported case of an incorrect query plan being generated using this
method.

Kind regards,

Mark.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2006-09-19 13:59:41 Re: system cache and buffer cache
Previous Message Jim C. Nasby 2006-09-19 13:53:05 Re: [PATCHES] Patch for UUID datatype (beta)