Re: heavy swapping, not sure why

From: Lonni J Friedman <netllama(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-general <pgsql-general(at)postgresql(dot)org>
Subject: Re: heavy swapping, not sure why
Date: 2011-08-30 16:17:54
Message-ID: CAP=oouFOKT7Mbukp3bKx9rJ2sPRf=okrZMD_=5mAvE9y0qxCgA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Lonni J Friedman <netllama(at)gmail(dot)com> writes:
>> I have several Linux-x68_64 based dedicated PostgreSQL servers where
>> I'm experiencing significant swap usage growth over time.  All of them
>> have fairly substantial amounts of RAM (not including swap), yet the
>> amount of swap that postgres is using ramps up over time and
>> eventually hurts performance badly.  In every case, simply restarting
>> postgresql frees up all the swap in use (until it ramps up again
>> later).
>
> If you're certain that it's restarting *postgres* that does it, and not
> restarting your application or pgbouncer or some other code, then it
> seems like you must have uncovered a memory leak someplace.  We haven't
> got nearly enough info here to diagnose it though.
>
> First thing I'd want to know is which process(es) exactly are bloating.
> The top output you showed us is unhelpful for that since it just shows
> them all as "postmaster" --- you'll need to match up the problem PIDs
> with "ps auxww" output.  Keep in mind also that top is pretty awful
> about distinguishing a process's actual memory use (private memory)
> from the portion of PG's shared memory that it happens to have touched.
> What you need to pay attention to is RES minus SHR, not either number
> alone.  With shared buffers set as high as you've got it, you'll
> probably not be able to be sure that a process is bloating until it's
> eaten hundreds of megs of private space.

In the past 18 hours, swap usage has nearly doubled on systemA:
$ free -m
total used free shared buffers cached
Mem: 56481 56210 271 0 11 52470
-/+ buffers/cache: 3727 52753
Swap: 1099 35 1064

As a reminder, this is what it was yesterday afternoon (roughly 18
hours earlier):
total used free shared buffers cached
Mem: 56481 55486 995 0 15 53298
-/+ buffers/cache: 2172 54309
Swap: 1099 18 1081

Here's top sorted by memory usage:
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2250 postgres 20 0 13.5g 10g 10g S 0.0 19.0 4:50.17
postmaster
20427 postgres 20 0 13.6g 8.6g 8.5g D 3.6 15.5 6:15.53
postmaster
1861 postgres 20 0 13.5g 2.2g 2.2g D 7.2 3.9 0:44.34
postmaster
32275 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.7 0:32.52
postmaster
2444 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.6 0:21.22
postmaster
21632 postgres 20 0 13.5g 2.0g 1.9g S 0.0 3.6 1:48.76
postmaster
1870 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:18.70
postmaster
2383 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:13.23
postmaster
32280 postgres 20 0 13.5g 1.6g 1.6g S 0.0 2.9 0:20.97
postmaster
2378 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.36
postmaster
2367 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.71
postmaster
2396 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.3 0:09.33
postmaster
31794 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.2 0:14.68
postmaster
1891 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.1 0:06.49
postmaster
2435 postgres 20 0 13.5g 1.2g 1.1g S 0.0 2.1 0:16.38
postmaster
2370 postgres 20 0 13.5g 1.1g 1.1g S 0.0 2.0 0:08.51
postmaster
2411 postgres 20 0 13.5g 1.0g 1.0g S 0.0 1.9 0:09.18
postmaster
31782 postgres 20 0 13.5g 981m 967m S 0.0 1.7 0:12.33
postmaster
32314 postgres 20 0 13.5g 909m 891m S 0.0 1.6 0:09.07
postmaster
2395 postgres 20 0 13.5g 890m 876m S 0.0 1.6 0:07.16
postmaster
2381 postgres 20 0 13.5g 829m 815m S 0.0 1.5 0:05.30
postmaster
2213 postgres 20 0 13.5g 829m 817m S 0.0 1.5 0:04.40
postmaster
2421 postgres 20 0 13.5g 803m 790m S 0.0 1.4 0:05.11
postmaster
1866 postgres 20 0 13.5g 797m 784m S 0.0 1.4 0:08.10
postmaster
2371 postgres 20 0 13.5g 793m 781m S 0.0 1.4 0:05.88
postmaster
3460 postgres 20 0 13.5g 786m 773m S 0.0 1.4 0:06.17
postmaster
2418 postgres 20 0 13.5g 604m 594m S 0.0 1.1 0:01.69
postmaster
2425 postgres 20 0 13.5g 582m 570m S 0.0 1.0 0:02.67
postmaster
5863 postgres 20 0 13.5g 358m 347m D 0.0 0.6 0:01.90
postmaster
5865 postgres 20 0 13.5g 293m 283m S 0.0 0.5 0:01.70
postmaster
2243 postgres 20 0 13.5g 289m 288m S 0.0 0.5 0:07.01
postmaster
5862 postgres 20 0 13.5g 195m 189m S 0.0 0.3 0:00.48
postmaster
1890 postgres 20 0 13.5g 149m 145m S 0.0 0.3 0:06.02
postmaster
5852 postgres 20 0 13.5g 132m 124m S 0.0 0.2 0:01.04
postmaster
32316 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.60
postmaster
1876 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.16
postmaster
5866 postgres 20 0 13.5g 17m 15m S 0.0 0.0 0:00.01
postmaster
2251 postgres 20 0 13.5g 17m 16m S 0.0 0.0 0:57.19
postmaster
32328 postgres 20 0 13.5g 16m 13m S 0.0 0.0 0:00.20
postmaster
4893 postgres 20 0 13.5g 15m 13m S 0.0 0.0 0:00.15
postmaster
2426 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.15
postmaster
1878 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.12
postmaster
2408 postgres 20 0 13.5g 13m 10m S 0.0 0.0 0:00.06
postmaster
1896 postgres 20 0 13.5g 12m 10m S 0.0 0.0 0:00.13
postmaster
2427 postgres 20 0 13.5g 10m 8436 S 0.0 0.0 0:00.07
postmaster
2401 postgres 20 0 13.5g 8484 6412 S 0.0 0.0 0:00.03
postmaster
5864 postgres 20 0 13.5g 7740 5836 S 0.0 0.0 0:00.00
postmaster
5853 postgres 20 0 13.5g 7528 5652 S 0.0 0.0 0:00.00 postmaster

I've attached the corresponding 'ps auxwww' output as well (since its
rather wide, and it will likely get mangled inside the email body),
but here it is too for those who don't want to deal with the
attachment:
##############
postgres 1861 2.1 4.1 14135092 2421300 ? Ds 08:37 0:45
postgres: lfriedman farm 127.0.0.1(34293) INSERT
postgres 1866 0.3 1.4 14142616 816732 ? Ss 08:38 0:08
postgres: lfriedman nightly 127.0.0.1(56661) idle
postgres 1870 0.9 3.1 14147660 1816576 ? Ss 08:38 0:18
postgres: lfriedman nightly 127.0.0.1(56665) idle
postgres 1876 0.0 0.0 14134040 21892 ? Ss 08:38 0:00
postgres: lfriedman cbs 127.0.0.1(56670) idle
postgres 1878 0.0 0.0 14134636 14880 ? Ss 08:38 0:00
postgres: lfriedman cbs 127.0.0.1(56672) idle
postgres 1890 0.3 0.2 14134992 153224 ? Ss 08:40 0:06
postgres: lfriedman farm 127.0.0.1(56673) idle
postgres 1891 0.3 2.1 14134852 1215668 ? Ss 08:40 0:06
postgres: lfriedman farm 127.0.0.1(56674) idle
postgres 1896 0.0 0.0 14134612 13244 ? Ss 08:40 0:00
postgres: lfriedman cbs 127.0.0.1(56677) idle
postgres 2213 0.2 1.4 14142356 849448 ? Ss 08:40 0:04
postgres: lfriedman nightly 127.0.0.1(56693) idle
postgres 2243 0.0 0.5 14125816 296012 ? S Aug29 0:07
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres 2248 0.0 0.0 159892 908 ? Ss Aug29 0:00
postgres: logger process
postgres 2250 0.3 19.0 14133028 11013112 ? Ss Aug29 4:50
postgres: writer process
postgres 2251 0.0 0.0 14132476 17532 ? Ss Aug29 0:57
postgres: wal writer process
postgres 2252 0.0 0.0 14133884 2428 ? Ss Aug29 0:09
postgres: autovacuum launcher process
postgres 2253 0.0 0.0 161272 1872 ? Ss Aug29 0:33
postgres: stats collector process
postgres 2367 0.8 2.3 14147948 1385192 ? Ss 08:41 0:15
postgres: lfriedman nightly 127.0.0.1(56695) idle
postgres 2370 0.4 1.9 14147088 1151312 ? Ss 08:41 0:08
postgres: lfriedman nightly 127.0.0.1(56698) idle
postgres 2371 0.3 1.4 14142216 814360 ? Ds 08:41 0:05
postgres: lfriedman nightly 127.0.0.1(56699) UPDATE
postgres 2378 0.8 2.3 14142980 1387028 ? Ss 08:41 0:15
postgres: lfriedman nightly 127.0.0.1(56706) idle
postgres 2381 0.2 1.4 14144484 849852 ? Ss 08:41 0:05
postgres: lfriedman nightly 127.0.0.1(56709) idle
postgres 2383 0.7 3.1 14147196 1800300 ? Ss 08:41 0:13
postgres: lfriedman nightly 127.0.0.1(56711) idle
postgres 2395 0.3 1.5 14142816 911604 ? Ss 08:41 0:07
postgres: lfriedman nightly 127.0.0.1(56722) idle
postgres 2396 0.5 2.2 14144544 1302852 ? Ss 08:41 0:09
postgres: lfriedman nightly 127.0.0.1(56723) idle
postgres 2401 0.0 0.0 14134040 8984 ? Ss 08:41 0:00
postgres: lfriedman cbs 127.0.0.1(56728) idle
postgres 2408 0.0 0.0 14134824 13940 ? Ss 08:41 0:00
postgres: lfriedman cbs 127.0.0.1(56735) idle
postgres 2411 0.4 1.8 14147688 1087016 ? Ss 08:41 0:09
postgres: lfriedman nightly 127.0.0.1(56738) idle
postgres 2418 0.0 1.0 14141992 619484 ? Ss 08:41 0:01
postgres: lfriedman nightly 127.0.0.1(56745) idle
postgres 2421 0.2 1.4 14142428 822400 ? Ss 08:41 0:05
postgres: lfriedman nightly 127.0.0.1(56748) idle
postgres 2425 0.1 1.0 14142500 596420 ? Ss 08:41 0:02
postgres: lfriedman nightly 127.0.0.1(56752) idle
postgres 2426 0.0 0.0 14134720 15896 ? Ss 08:41 0:00
postgres: lfriedman cbs 127.0.0.1(56753) idle
postgres 2427 0.0 0.0 14134040 10536 ? Ss 08:41 0:00
postgres: lfriedman cbs 127.0.0.1(56754) idle
postgres 2435 0.8 2.0 14147132 1210368 ? Ss 08:41 0:16
postgres: lfriedman nightly 127.0.0.1(56761) idle
postgres 2444 1.1 3.5 14136592 2063556 ? Ss 08:42 0:21
postgres: lfriedman farm 127.0.0.1(56769) idle
postgres 3460 0.5 1.3 14142640 805412 ? Ss 08:51 0:06
postgres: lfriedman nightly 127.0.0.1(37675) idle
postgres 4893 0.0 0.0 14134520 16068 ? Ss 09:02 0:00
postgres: lfriedman inventory 127.0.0.1(50384) idle
postgres 5852 0.6 0.2 14139168 135420 ? Ss 09:09 0:01
postgres: lfriedman nightly 127.0.0.1(39228) idle
postgres 5853 0.0 0.0 14133968 7528 ? Ss 09:09 0:00
postgres: lfriedman nightly 127.0.0.1(39229) idle
postgres 5862 0.3 0.3 14137720 200300 ? Ss 09:10 0:00
postgres: lfriedman nightly 127.0.0.1(39230) idle
postgres 5863 1.5 0.6 14144120 367536 ? Ss 09:10 0:01
postgres: lfriedman nightly 127.0.0.1(39231) idle
postgres 5864 0.0 0.0 14133968 7740 ? Ss 09:10 0:00
postgres: lfriedman nightly 127.0.0.1(39232) idle
postgres 5865 1.7 0.5 14144176 338516 ? Ss 09:10 0:02
postgres: lfriedman nightly 127.0.0.1(39233) idle
postgres 5866 0.0 0.0 14135036 18388 ? Ss 09:10 0:00
postgres: lfriedman nightly 127.0.0.1(39234) idle
postgres 20427 0.6 15.5 14232764 8991924 ? Ss Aug29 6:16
postgres: autovacuum worker process nightly
postgres 21632 0.1 3.5 14146368 2055856 ? Ss Aug29 1:48
postgres: lfriedman nightly 127.0.0.1(33664) idle
postgres 31782 0.3 1.7 14142696 1005052 ? Ss 08:18 0:12
postgres: lfriedman nightly 127.0.0.1(39318) idle
postgres 31794 0.4 2.1 14149300 1261472 ? Ss 08:18 0:14
postgres: lfriedman nightly 127.0.0.1(39330) idle
postgres 32275 1.0 3.6 14147716 2125608 ? Ss 08:21 0:32
postgres: lfriedman nightly 127.0.0.1(39356) idle
postgres 32280 0.6 2.9 14150556 1699160 ? Ss 08:21 0:20
postgres: lfriedman nightly 127.0.0.1(39361) idle
postgres 32314 0.3 1.6 14146864 930840 ? Ss 08:22 0:09
postgres: lfriedman nightly 127.0.0.1(39395) idle
postgres 32328 0.0 0.0 14134636 16396 ? Ss 08:22 0:00
postgres: lfriedman cbs 127.0.0.1(39409) idle
##############

I'm not sure if there's much to work with at this point, since is only
been 18 hours. Please let me know if I can get additional data, or if
its better to wait several more days before capturing this data again.

thanks!

Attachment Content-Type Size
ps-auxww.out application/octet-stream 5.7 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Peter Warasin 2011-08-30 16:19:47 duplicate key violates unique on a nextval() field
Previous Message JD Wong 2011-08-30 16:03:48 IDLE queries taking up space