Re: Weird indices

From: "Richard Huxton" <dev(at)archonet(dot)com>
To: "Martijn van Oosterhout" <kleptog(at)svana(dot)org>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: <pgsql-general(at)postgresql(dot)org>
Subject: Re: Weird indices
Date: 2001-02-22 10:04:18
Message-ID: 004101c09cb6$d573f060$1001a8c0@archonet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

From: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>

> Martijn van Oosterhout <kleptog(at)svana(dot)org> writes:
> > We have a table with over 1 million rows and the statistics Postgres
gathers
> > are not particularly useful. There is not one (non-null) value that
occurs
> > significantly more often than other values but the distribution looks a
lot
> > like a 1/x curve I guess. The most common value occurs 5249 times but
the
> > average is only 95, so Postgres chooses seq scan almost always. We
actually
> > now set enable_seqscan=off in many areas of our code to speed it up to a
> > useful rate. (This table also happens to have an (accedental) clustering
on
> > this column also).
>
> > What is the reasoning behind estimating like that? Why not just the
average
> > or the average + 1 SD?
>
> Can you think of a reasonable algorithm for VACUUM to obtain the true
> average frequency? It has a tough enough time estimating the most
> common frequency with any reliability. Given complaints in nearby
> threads that VACUUM ANALYZE is too slow, it'd be a good idea if your
> algorithm was faster than the current one, too ;-)

I'm don't see that there's any way that you're going to get an analyser that
_always_ gets it right. Might there not be some way of explicitly telling
the analyser the distribution of the data. Like Martijn says above, he
thinks the distribution is something like 1/x. In the cases where you really
care you probably do know what sort of values are stored.

I have to admit my maths isn't good enough to say how sensible an idea this
is, but figured I'd put my tuppence-worth in.

- Richard Huxton

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Alexander Bodnar 2001-02-22 10:21:05 Help with plpgsql - subst variable value
Previous Message hubert depesz lubaczewski 2001-02-22 09:39:19 Re: problem while compiling user c functions in 7.1beta4