Re: Stopgap solution for table-size-estimate updating problem

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Zeugswetter Andreas DAZ SD <ZeugswetterA(at)spardat(dot)at>, pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Stopgap solution for table-size-estimate updating problem
Date: 2004-11-28 18:52:24
Message-ID: 19266.1101667944@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs <simon(at)2ndquadrant(dot)com> writes:
> On the topic of accuracy of the estimate: Updates cause additional data
> to be written to the table, so tables get bigger until vacuumed. Tables
> with many Inserts are also regularly trimmed with Deletes. With a
> relatively static workload and a regular vacuum cycle, the table size
> for many major tables eventually levels off, remaining roughly constant
> but the number of non-zero pages will vary over time in a saw-tooth
> curve. Estimating the cardinality by using the number of blocks would
> ignore the fact that many of them are empty for much of the time. That
> would then lead to a systematic over-estimate of the cardinality of the
> regularly updated tables.

You mean underestimate. After a VACUUM, the tuples-per-page figure
would be set to a relatively low value, and then subsequent inserts
would fill in the free space, causing the actual density to rise
while the physical number of blocks stays more or less constant.
So the proposed method would always give an accurate number of blocks,
but it would tend to underestimate the number of tuples in a dynamic
situation.

Still, it's better than the current method, which is likely to
underestimate both parameters. I believe that having an accurate block
count and an underestimated tuple count would tend to favor choosing
indexscans over seqscans, which is probably a good thing --- when was
the last time you saw someone complaining that the planner had
improperly chosen an indexscan over a seqscan?

> How easy would it be to take into account the length of the FSM for the
> relation also?

Don't think this would help; the FSM doesn't really track number of
tuples. Free space isn't a good guide to number of tuples because you
can't distinguish inserts from updates at that level. (I'm also a bit
concerned about turning the FSM into a source of contention.)

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-11-28 19:17:09 Re: Fix for "q" with psql display paging dumps out of psql
Previous Message Joe Conway 2004-11-28 18:50:16 Re: Status of server side Large Object support?