Quick Links

Reduce maximum error in tuples estimation after vacuum.

From:	Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
To:	pgsql-hackers(at)postgresql(dot)org
Subject:	Reduce maximum error in tuples estimation after vacuum.
Date:	2013-06-14 08:35:28
Message-ID:	20130614.173528.146929812.horiguchi.kyotaro@lab.ntt.co.jp
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

Hello,

Postgresql estimates the number of live tuples after the vacuum
has left some buffers unscanned. This estimation does well for
most cases, but makes completely different result with a strong
imbalance of tuple density.

For example,

create table t (a int, b int);
insert into t (select a, (random() * 100000)::int from generate_series((select count(*) from t) + 1, 1000000) a);
update t set b = b + 1 where a < (select count(*) from t) * 0.7;
vacuum t;
delete from t where a < (select count(*) from t) * 0.99;

After this, pg_stat_user_tables.n_live_tup shows 417670 which is
41 times larger than the real number of rows 100001. And what
makes it worse, autovacuum nor autoanalyze won't run until
n_dead_tup goes above 8 times larger than the real number of
tuples in the table for the default settings..

| postgres=# select n_live_tup, n_dead_tup
| from pg_stat_user_tables where relname='t';
| n_live_tup | n_dead_tup
| ------------+------------
| 417670 | 0
|
| postgres=# select reltuples from pg_class where relname='t';
| reltuples
| -----------
| 417670
|
| postgres=# select count(*) from t;
| count
| -------
| 10001

Using n_dead_tup before vacuuming seems to make it better but I
heard that the plan is abandoned from some reason I don't know.

So I've come up with the another plan - using FSM to estimate the
tuple density in unscanned pages. The point is that make
estimation reliying on the uniformity of tuple length instead of
tuple density. This change seems keeping that errors under a few
times of tuples. Additional page reads for FSM are about 4000th
(SlotsPerFSMPage) of the skipped pages, and I suppose this is
tolerable during vacuum.

Overall algorithm could be illistrated as below,

- summing up used bytes, max offnum(PageGetMaxOffsetNumber),
maximum free bytes for tuple data , and free bytes after page
vacuum through all scanned pages.

- summing up free bytes informed by FSM through all skipped
pages.

- Calculate mean tuple length from the overall used bytes and
sum of max offnums, and scanned pages.

- Guess tuple density in skipped pages using overall free bytes
from FSM and the mean tuple length calculated above.

- Finally, feed estimated number of the live tuples BEFORE
vacuum into vac_estimate_reltuples.

Of course this method affected by the imbalance of tuple LENGTH,
but it also seems to be kept within a few times of the number of
tuples.

for rows with invariable length, the test for head shows, where
"tups est" is pg_class.reltuples and "tups real" is count(*).

del% | pages | n_live_tup | tups est | tups real | est/real | bufs
-----+-------+------------+----------+-----------+----------+------
0.9 | 4425 | 100001 | 470626 | 100001 | 4.706 | 3985
0.95 | 4425 | 50001 | 441196 | 50001 | 8.824 | 4206
0.99 | 4425 | 417670 | 417670 | 10001 | 41.763 | 4383

and with the patch

0.9 | 4425 | 106169 | 106169 | 100001 | 1.062 | 3985
0.95 | 4425 | 56373 | 56373 | 50001 | 1.127 | 4206
0.99 | 4425 | 10001 | 16535 | 10001 | 1.653 | 4383

What do you think about this?

=====

The attached files are:

- vacuum_est_improve_20130614.patch: the patch for this proposal

- vactest.sql: sql script to cause the sitiation

- vactest.sh: test script to find the errors relating this patch.

- test_result.txt: all of the test result for various deletion
ratio which the test script above yields.

regards,
--
Kyotaro Horiguchi
NTT Open Source Software Center

Attachment	Content-Type	Size
vacuum_est_improve_20130614.patch	text/x-patch	6.1 KB
unknown_filename	text/plain	530 bytes
unknown_filename	text/plain	2.4 KB
unknown_filename	text/plain	3.5 KB

Responses

Re: Reduce maximum error in tuples estimation after vacuum. at 2013-06-14 09:16:42 from Kyotaro HORIGUCHI
Re: Reduce maximum error in tuples estimation after vacuum. at 2013-06-24 06:01:06 from Amit Kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Kyotaro HORIGUCHI	2013-06-14 08:44:15	Add visibility map information to pg_freespace.
Previous Message	Cédric Villemain	2013-06-14 07:46:53	Re: [PATCH] Add transforms feature