Who is causing all this i/o?

From: Craig James <craig_james(at)emolecules(dot)com>
To: pgsql-admin(at)postgresql(dot)org
Subject: Who is causing all this i/o?
Date: 2011-05-20 21:14:14
Message-ID: 4DD6D9A6.1010101@emolecules.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-admin

Our development server (PG 8.4.4 on Ubuntu server) is constantly doing something, and I can't figure out what. The two production servers, which are essentially identical, don't show these symptoms. In a nutshell, it's showing 10K blocks per second of data going out, all the time, and essentially zero blocks per second of input.

To start with, if I turn off Postgres, everything is quiet:

$ /etc/init.d/postgresql stop
Stopping PostgreSQL: server stopped
ok
[radon:root] /emi/logs/londiste # vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 126140 1115944 112692 10466452 0 0 132 829 1 3 2 1 97 1
0 0 126140 1116068 112692 10466452 0 0 0 0 150 175 0 0 100 0
0 0 126140 1116184 112708 10466456 0 0 0 94 123 153 0 0 100 0
0 0 126140 1116432 112708 10466456 0 0 0 0 96 126 0 0 100 0
0 0 126140 1116928 112708 10466456 0 0 0 0 94 111 0 0 100 0
0 0 126140 1116688 112708 10466456 0 0 0 0 114 144 0 0 100 0
0 0 126140 1116364 112708 10466456 0 0 18 74 229 266 1 0 99 0
0 0 126140 1116488 112708 10466492 0 0 0 0 111 138 0 0 100 0
1 0 126140 1116868 112716 10466492 0 0 0 20 96 121 0 0 100 0

But when I restart Postgres, it looks like this, and runs this way 24/7. Notice the "bo" column:

$ /etc/init.d/postgresql start
Starting PostgreSQL: ok
$ vmstat 2
procs -----------memory---------- ---swap-- -----io---- -system-- ----cpu----
r b swpd free buff cache si so bi bo in cs us sy id wa
0 0 126140 1078020 112744 10480652 0 0 132 829 1 3 2 1 97 1
0 1 126140 1054320 112744 10487792 0 0 52 10714 900 976 2 1 97 0
0 1 126140 1015988 112760 10511060 0 0 1828 9620 1491 3527 5 2 87 6
0 1 126140 1008316 112768 10517936 0 0 672 10994 982 1209 1 2 86 11
0 0 126140 1017780 112776 10528824 0 0 1466 10746 1003 1428 2 1 89 8
0 0 126140 1013324 112784 10533680 0 0 0 9718 766 858 1 0 99 0
1 0 126140 991600 112784 10539220 0 0 0 10714 962 1127 2 1 97 0
0 0 126140 999924 112788 10544380 0 0 6 10726 918 1227 2 1 97 0
0 0 126140 995700 112800 10548816 0 0 0 9744 823 889 1 1 99 0
2 0 126140 964560 112804 10554048 0 0 2 10718 903 1196 3 1 96 0

That seems to be a lot of output data going somewhere, but there doesn't seem to be anything going on that should cause it:

postgres=# select datname, procpid, usename, substr(current_query,0,25) current_query, waiting from pg_stat_activity;
datname | procpid | usename | current_query | waiting
------------+---------+----------+--------------------------+---------
postgres | 12927 | postgres | select datname, procpid, | f
global | 12185 | postgres | <IDLE> | f
accounting | 12367 | postgres | <IDLE> | f
orders | 12225 | orders | <IDLE> | f
archive | 12267 | postgres | <IDLE> | f
(5 rows)

After a lot of digging around, I found this in the /postgres/pg_stat_tmp directory. If I list the directory including the i-nodes once every second, I find that a new 2MB file is being created roughly once every two seconds:

# ls -li
total 2384
69731459 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731258 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731459 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731459 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 3332
69731258 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
69731459 -rw------- 1 postgres postgres 970752 2011-05-20 14:07 pgstat.tmp
# ls -li
total 2384
69731459 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731258 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731258 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat
# ls -li
total 2384
69731258 -rw------- 1 postgres postgres 2437957 2011-05-20 14:07 pgstat.stat

This is a development server, so there are no queries coming in from the web or other random activities. Nothing is going on.

The other two servers are configured identically. If I diff the configuration files, the only difference is the IP addresses for the "listen" section.

Can anyone tell me what's going on? Why is pgstat.stat being rewritten on this server constantly and not on the other two servers?

Thanks,
Craig

Responses

Browse pgsql-admin by date

  From Date Subject
Next Message Joshua D. Drake 2011-05-20 22:38:34 Re: Who is causing all this i/o?
Previous Message Tim Lewis 2011-05-20 15:58:05 Re: pg_archivecleanup with multiple slaves