Re: Potential autovacuum optimization: new tables

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Potential autovacuum optimization: new tables
Date: 2012-10-13 01:49:48
Message-ID: 5078C8BC.5060303@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


> No, it's not that easy. The question you have to ask is "when has that
> initial write burst stopped?". As an example, if autovacuum happened to
> see that table in the instant after CREATE, it might autovacuum it while
> it's still empty, and then this rule fails to trigger any further effort.

Well, frankly, it would be useful to know it's empty too. If you really
wanna see that 1000-row default estimate bite you on the tuchas, try a
JOIN against an empty new table.

> Personally I've always thought that autovacuum's rules should be based
> on a percentage of rows changed, not an absolute threshold (or maybe in
> addition to an absolute threshold). This way, if you create a table and
> insert 10 rows, that would make it subject to analyze on-sight, even if
> autovac had managed to pass by while it was still empty, because the
> percentage-changed is infinite. Then, if you insert the other 35 rows
> you meant to insert, it's *again* subject to autoanalyze on the next
> pass, because the percentage-changed is still 350%

> I remember having got voted down on the percentage approach back when
> we first put AV into core, but I remain convinced that decision was a
> bad one.

Yeah, I was one of the ones voting against you. The reason not to have
percentage-only is for small tables. Imagine that you have a table with
18 rows, and analyze_threshold is 0 and analyze_scale_factor is 0.1.

Add two rows --> ANALYZE
UPDATE two rows --> ANALYZE
UPDATE three more rows --> ANALYZE
DELETE three rows --> ANALYZE

Without a threshold, any table under 100 rows which gets UPDATEs would
be continually in the autoanalyze queue, which would not be beneficial;
the churn in pg_statistic alone would be detrimental.

I guess the real issue is that we tried to solve the issue of ANALYZEing
tables of radically different sizes with a simple heuristic of
threshold+scale_factor, and that's proving too simple for actual
production sites. The other end where autoanalyze often falls down is
the high end (tables with a million rows).

Can anyone think of a new heuristic which doesn't involve adding 2-6 new
GUCS knobs?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-10-13 02:03:18 Re: Potential autovacuum optimization: new tables
Previous Message Stephen Frost 2012-10-13 01:25:33 Re: Potential autovacuum optimization: new tables