Quick Links

Re: Potential autovacuum optimization: new tables

From:	Josh Berkus <josh(at)agliodbs(dot)com>
To:	Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc:	pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Potential autovacuum optimization: new tables
Date:	2012-10-13 01:49:48
Message-ID:	5078C8BC.5060303@agliodbs.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

> No, it's not that easy. The question you have to ask is "when has that
> initial write burst stopped?". As an example, if autovacuum happened to
> see that table in the instant after CREATE, it might autovacuum it while
> it's still empty, and then this rule fails to trigger any further effort.

Well, frankly, it would be useful to know it's empty too. If you really
wanna see that 1000-row default estimate bite you on the tuchas, try a
JOIN against an empty new table.

> Personally I've always thought that autovacuum's rules should be based
> on a percentage of rows changed, not an absolute threshold (or maybe in
> addition to an absolute threshold). This way, if you create a table and
> insert 10 rows, that would make it subject to analyze on-sight, even if
> autovac had managed to pass by while it was still empty, because the
> percentage-changed is infinite. Then, if you insert the other 35 rows
> you meant to insert, it's *again* subject to autoanalyze on the next
> pass, because the percentage-changed is still 350%

> I remember having got voted down on the percentage approach back when
> we first put AV into core, but I remain convinced that decision was a
> bad one.

Yeah, I was one of the ones voting against you. The reason not to have
percentage-only is for small tables. Imagine that you have a table with
18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.

Add two rows --> ANALYZE
UPDATE two rows --> ANALYZE
UPDATE three more rows --> ANALYZE
DELETE three rows --> ANALYZE

Without a threshold, any table under 100 rows which gets UPDATEs would
be continually in the autoanalyze queue, which would not be beneficial;
the churn in pg_statistic alone would be detrimental.

I guess the real issue is that we tried to solve the issue of ANALYZEing
tables of radically different sizes with a simple heuristic of
threshold+scale_factor, and that's proving too simple for actual
production sites. The other end where autoanalyze often falls down is
the high end (tables with a million rows).

Can anyone think of a new heuristic which doesn't involve adding 2-6 new
GUCS knobs?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Re: Potential autovacuum optimization: new tables at 2012-10-13 01:19:29 from Tom Lane

Responses

Re: Potential autovacuum optimization: new tables at 2012-10-13 02:03:18 from Tom Lane

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-10-13 02:03:18	Re: Potential autovacuum optimization: new tables
Previous Message	Stephen Frost	2012-10-13 01:25:33	Re: Potential autovacuum optimization: new tables