Re: Stats for inheritance trees

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Re: Stats for inheritance trees
Date: 2009-12-30 00:29:24
Message-ID: 4823.1262132964@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> 3. Ideally autovacuum would know enough to perform ANALYZEs on
> inheritance parents after enough churn has occurred in their child
> table(s). I am not entirely clear about a good way to do that.
> We could have it just directly force an ANALYZE on parent(s) of any
> table it has chosen to ANALYZE, but that might be overkill --- in
> particular leading to excess ANALYZEs when several children receive
> a lot of updates.

I've been looking at this for a bit, and I think the only reasonable
way to do it is to make the pgstats mechanism track the need for
ANALYZE on a parent table. A hack like I suggested above would make
the autovacuum.c code even messier than it already is, and it seems
inevitable that we'd get duplicate analyze actions from different
autovac workers.

Now, I don't really want to add Yet Another per-table counter to pgstats
for this. The stats are big enough already. However, the existing
mechanism for triggering ANALYZE looks pretty bogus to me as I look at
it now: there's a last_anl_tuples value with a very hazy definition,
and what's worse it's being computed off numbers that may be only crude
estimates from ANALYZE. What I propose doing is to replace that counter
with a "changes_since_analyze" counter, which can be managed very
simply:

* when a tabstat message comes in, increment changes_since_analyze by
the sum of t_tuples_inserted + t_tuples_updated + t_tuples_deleted;

* when an analyze report message comes in, reset changes_since_analyze
to zero.

This gives us a number that is actually pretty credible and can still
be compared to the analyze threshold the same as before. I think the
current definition dates from before we had accurate
insert/delete/update tracking, but now that we have that, we should
use it.

Now, having done that, what I would suggest doing is having autovacuum
propagate the changes_since_analyze count that it sees up to the parent
table(s) whenever it does an autoanalyze. (This would require adding a
new message type that allows reporting a changes_since_analyze increment
independently of inserted/updated/deleted, or else adding
changes_since_analyze as an independent field in regular tabstat
messages.)

In most cases, with the parent table probably smaller than the child
tables, this would immediately make the parent a candidate for analyze.
That might be overkill, in which case we could try multiplying the count
by some sort of derating factor, but getting hold of a good derating
factor might be more expensive than it's worth --- I think you'd have to
look at all the other children of the same parent to see how big the
current one is compared to the rest.

Thoughts?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2009-12-30 01:21:11 Re: Stats for inheritance trees
Previous Message Simon Riggs 2009-12-30 00:13:01 Re: Hot Standy introduced problem with query cancel behavior