Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.

From: David Gould <daveg(at)sonic(dot)net>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Alexander Kuzmenkov <a(dot)kuzmenkov(at)postgrespro(dot)ru>, pgsql-hackers(at)lists(dot)postgresql(dot)org, Alina Alexeeva <alexeeva(at)adobe(dot)com>, Ullas Lakkur Raghavendra <lakkurra(at)adobe(dot)com>
Subject: Re: [patch] BUG #15005: ANALYZE can make pg_class.reltuples inaccurate.
Date: 2018-03-03 02:57:52
Message-ID: 20180302185752.46b82671@engels
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, 02 Mar 2018 17:17:29 -0500
Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> But by the same token, analyze only looked at 0.0006 of the pages. It's
> nice that for you, that's enough to get a robust estimate of the density
> everywhere; but I have a nasty feeling that that won't hold good for
> everybody.

My grasp of statistics is somewhat weak, so please inform me if I've got
this wrong, but every time I've looked into it I've found that one can get
pretty good accuracy and confidence with fairly small samples. Typically 1000
samples will serve no matter the population size if the desired margin of
error is 5%. Even with 99% confidence and a 1% margin of error it takes less
than 20,000 samples. See the table at:

http://www.research-advisors.com/tools/SampleSize.htm

Since we have by default 30000 sample pages and since ANALYZE takes some
trouble to get a random sample I think we really can rely on the results of
extrapolating reltuples from analyze.

-dg

--
David Gould daveg(at)sonic(dot)net
If simplicity worked, the world would be overrun with insects.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2018-03-03 03:46:13 Re: zheap: a new storage format for PostgreSQL
Previous Message David Steele 2018-03-03 02:08:22 Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions