Weirdness with the stats collector process

From: Matthew Musgrove <mmusgrove(at)emortgagelogic(dot)com>
To: "pgsql-general(at)postgresql(dot)org" <pgsql-general(at)postgresql(dot)org>
Subject: Weirdness with the stats collector process
Date: 2016-07-25 18:50:30
Message-ID: D6E71BEFAD7BEB4FBCD8AE74FADB1265011BB40FC749@win-8-eml-ex1.eml.local
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

One of our instances has been behaving -- oddly. Most queries are blazing fast. It appears to just be some of the stat views that are slow.

Queries against the following views are quick: pg_stat_activity, pg_stat_xact_all_tables, pg_stat_xact_sys_tables, pg_stat_xact_user_tables, pg_statio_sys_sequences (no system sequences), pg_stat_xact_user_functions, and pg_stat_replication.

When I query one of the other statistics views, the queries take roughly 10 seconds when they used to take milliseconds. Here's an explain of one such query: https://explain.depesz.com/s/IVEu

Logs show 'using stale statistics instead of current ones because stats collector is not responding' when I query the stats views.

CPU usage is very low. Disk I/O looks okay to me.

-bash-4.1$ iostat
Linux 2.6.32-573.12.1.el6.x86_64 (hostname) 07/25/2016 _x86_64_ (16 CPU)

avg-cpu: %user %nice %system %iowait %steal %idle
2.80 0.00 0.15 0.01 0.00 97.04

Device: tps Blk_read/s Blk_wrtn/s Blk_read Blk_wrtn
sda 0.45 4.85 7.45 33142948 50914838
sdb 31.48 1403.57 713.55 9597023154 4878965216
dm-0 0.14 0.58 0.56 3949720 3847592
dm-1 1.00 4.26 6.83 29155482 46727688

Most of the files in $PGDATA/pg_stat_tmp haven't been updated by PostgreSQL since July 7th at 14:16. The one exception is pgss_query_texts.stat. I shouldn't have but I touched $PGDATA/pg_stat_tmp/db* on July 12th at 15:14 (doing so did trigger an autovacuum though).

I set track_activities and track_counts to off and reloaded config (pg_ctl reload). The log indicated that it noticed the change. I created a directory /dev/shm/pg_stat_tmp, set track_activities and track_counts on, set stats_temp_directory to '/dev/shm/pg_stat_tmp' and reloaded config. Again, the log indicated that it noticed the change but nothing happened with stats. The new directory is empty.

I notified the statistics collector to quit (kill -QUIT). It did and the postmaster restarted it. The new stats directory is still empty. The only updates to the old stats directory is the pgss_query_texts.stat file.

I tried getting stack traces and the like with gdb and strace but it wasn't really helpful.

Most of what I have tried was the result of help by Jim Nasby on postgresteam.slack.com. He and Lukas Fittl suggested that I post about my problem. There's one thing that we all agree on... It should not be so hard to figure out what is going on with the statistics collector.

I tried turning stats off and on again today. Here's what the two pg_stat_tmp directories look like today:
so now the directory looks like:

-bash-4.1$ ls -alh 9.4/data/pg_stat_tmp/
total 4.3M
drwx------ 2 postgres postgres 4.0K Jul 7 14:16 .
drwx------ 19 postgres postgres 4.0K Jul 25 11:20 ..
-rw------- 1 postgres postgres 3.4K Jul 12 15:14 db_0.stat
-rw------- 1 postgres postgres 26K Jul 12 15:14 db_13003.stat
-rw------- 1 postgres postgres 13K Jul 12 15:14 db_1.stat
-rw------- 1 postgres postgres 140K Jul 12 15:14 db_2473933.stat
-rw------- 1 postgres postgres 203K Jul 12 15:14 db_2476698.stat
-rw------- 1 postgres postgres 27K Jul 12 15:14 db_2595334.stat
-rw------- 1 postgres postgres 28K Jul 12 15:14 db_2631762.stat
-rw------- 1 postgres postgres 1.5K Jul 7 14:16 global.stat
-rw------- 1 postgres postgres 3.9M Jul 25 13:39 pgss_query_texts.stat
-bash-4.1$ ls -alh /dev/shm/pg_stat_tmp/
total 0
drwxr-xr-x 2 postgres postgres 40 Jul 12 17:19 .
drwxrwxrwt 3 root root 80 Jul 11 17:31 ..

Does anyone have any suggestions on how to:
- see what the statistics collector is doing?
- tell the postmaster to give the statistics collector the up to date config?
- get stats working without restarting the instance? (I am not yet sure when that can happen but it would have to be scheduled [in advance] to occur during an after hours maintenance window.

Thanks in advance,
Matt

--
Matthew Musgrove
Senior Software Engineer
Assurant Mortgage Solutions
817.788.4482
mmusgrove(at)emortgagelogic(dot)com<mailto:mmusgrove(at)emortgagelogic(dot)com>
matthew(dot)musgrove(at)assurant(dot)com<mailto:matthew(dot)musgrove(at)assurant(dot)com>

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2016-07-25 20:20:52 Re: Weirdness with the stats collector process
Previous Message Peter Devoy 2016-07-25 17:10:59 Re: Return results of join with polymorphically-defined table in pl/pgsql