Re: Stopgap solution for table-size-estimate updating

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Stopgap solution for table-size-estimate updating
Date: 2004-11-28 22:16:19
Message-ID: 1101679551.2963.94.camel@localhost.localdomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 2004-11-28 at 18:52, Tom Lane wrote:
> Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> > On the topic of accuracy of the estimate: Updates cause additional data
> > to be written to the table, so tables get bigger until vacuumed. Tables
> > with many Inserts are also regularly trimmed with Deletes. With a
> > relatively static workload and a regular vacuum cycle, the table size
> > for many major tables eventually levels off, remaining roughly constant
> > but the number of non-zero pages will vary over time in a saw-tooth
> > curve. Estimating the cardinality by using the number of blocks would
> > ignore the fact that many of them are empty for much of the time. That
> > would then lead to a systematic over-estimate of the cardinality of the
> > regularly updated tables.
>
> You mean underestimate. After a VACUUM, the tuples-per-page figure
> would be set to a relatively low value, and then subsequent inserts
> would fill in the free space, causing the actual density to rise
> while the physical number of blocks stays more or less constant.
> So the proposed method would always give an accurate number of blocks,
> but it would tend to underestimate the number of tuples in a dynamic
> situation.

OK, just *wrong* then (the estimate, as well as myself) :-)

> Still, it's better than the current method, which is likely to
> underestimate both parameters. I believe that having an accurate block
> count and an underestimated tuple count would tend to favor choosing
> indexscans over seqscans, which is probably a good thing --- when was
> the last time you saw someone complaining that the planner had
> improperly chosen an indexscan over a seqscan?

Well, yes, but user perception is not always right.

Given we expect an underestimate, can we put in a correction factor
should the estimate get really low...sounds like we could end up
choosing nested joins more often when we should have chosen merge joins.
That is something that people do regularly complain about (indirectly).

> > How easy would it be to take into account the length of the FSM for the
> > relation also?
>
> Don't think this would help; the FSM doesn't really track number of
> tuples. Free space isn't a good guide to number of tuples because you
> can't distinguish inserts from updates at that level. (I'm also a bit
> concerned about turning the FSM into a source of contention.)

Agreed.

--
Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-11-28 22:23:04 Re: [JDBC] Strange server error with current 8.0beta driver
Previous Message Bruce Momjian 2004-11-28 22:04:24 Re: SQL:2003 TODO items