last_statrequest is in the future

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: last_statrequest is in the future
Date: 2010-03-24 15:39:14
Message-ID: 22006.1269445154@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Well, I didn't actually think that this patch
http://archives.postgresql.org/pgsql-committers/2010-03/msg00181.php
would yield much insight, but lookee what we have here:
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=jaguar&dt=2010-03-24%2004:00:07

[4ba99150.5099:483] LOG: statement: VACUUM ANALYZE num_exp_add;
[4ba99145.5071:1] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:2] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:3] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:4] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:5] LOG: last_statrequest is in the future, resetting
...
[4ba99145.5071:497] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:498] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:499] LOG: last_statrequest is in the future, resetting
[4ba99145.5071:500] LOG: last_statrequest is in the future, resetting
[4ba99150.5099:484] WARNING: pgstat wait timeout

There are multiple occurrences of "pgstat wait timeout" in the
postmaster log (some evidently from autovacuum, because they don't show
up as regression diffs), and every one of them is associated with a
bunch of "last_statrequest is in the future" bleats.

So at least on jaguar, it seems that the reason for this behavior is
that the system clock is significantly skewed between the stats
collector process and the backends, to the point where stats updates
generated by the collector will never appear new enough to satisfy the
requesting backends. I think I'm going to go back and modify the code
to show the actual numbers involved so we can see just how bad it is ---
but the skew must be more than five seconds or we'd not be seeing this
failure. That seems to me to put it in the class of "system bug".

Should we redesign the stats signaling logic to work around this,
or just hope we can nag kernel people into fixing it?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message A. Kretschmer 2010-03-24 17:31:59 question (or feature-request): over ( partition by ... order by LIMIT N)
Previous Message Gokulakannan Somasundaram 2010-03-24 15:34:46 Re: Performance Improvement for Unique Indexes