Re: Interpreting vmstat

From: Robert Treat <xzilla(at)users(dot)sourceforge(dot)net>
To: Doug Y <dylists(at)ptd(dot)net>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Interpreting vmstat
Date: 2004-05-25 21:07:51
Message-ID: 1085519271.9368.1916.camel@camel
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2004-05-18 at 14:12, Doug Y wrote:
> Run a query I've been having trouble with and watch the output of vmstat
> (linux):
>
> $ vmstat 1
> procs memory swap io system
> cpu
> r b w swpd free buff cache si so bi bo in cs us
> sy id
> 0 0 0 148 8732 193652
> 2786668 0 0 0 0 292 151 0 2 98
> 2 0 2 148 7040 193652
> 2786668 0 0 0 208 459 697 45 10 45
> 0 0 0 148 9028 193652
> 2786684 0 0 16 644 318 613 25 4 71
> 1 0 0 148 5092 193676
> 2780196 0 0 12 184 441 491 37 5 58
> 0 1 0 148 5212 193684
> 2772512 0 0 112 9740 682 1063 45 12 43
> 1 0 0 148 5444 193684
> 2771584 0 0 120 4216 464 1303 44 3 52
> 1 0 0 148 12232 193660
> 2771620 0 0 244 628 340 681 43 20 38
> 1 0 0 148 12168 193664
> 2771832 0 0 196 552 332 956 42 2 56
> 1 0 0 148 12080 193664
> 2772248 0 0 272 204 371 201 40 1 59
> 1 1 0 148 12024 193664
> 2772624 0 0 368 0 259 127 42 3 55
>
> Thats the first 10 lines or so... the query takes 60 seconds to run.
>
> I'm confused on the bo & bi parts of the io:
> IO
> bi: Blocks sent to a block device (blocks/s).
> bo: Blocks received from a block device (blocks/s).
>
> yet it seems to be opposite of that... bi only increases when doing a
> largish query, while bo also goes up, I typically see periodic bo numbers
> in the low 100's, which I'd guess are log writes.
>
> I would think that my entire DB should end up cached since a raw pg_dump
> file is about 1G in size, yet my performance doesn't indicate that that is
> the case... running the same query a few minutes later, I'm not seeing a
> significant performance improvement.
>

Been meaning to try and address this thread since it touches on one of
the areas that I think is sorely lacking in the postgresql admin
knowledge base; how to use various unix commands to deduce performance
information. This would seem even more important given that PostgreSQL
admins are expected to use said tools to find out some information that
the commercial databases provide to there users... but alas this is
-performance and not -advocacy so let me get on with it eh?

As you noted, bi and bo are actually reversed, and I believe if you
search the kernel hackers mailing lists you'll find references to
this... here's some more empirical evidence though, the following
vmstat was taken from a high-write traffic monitoring type database
application...

procs memory swap io system cpu
r b w swpd free buff cache si so bi bo in cs us sy id
1 0 0 27412 593336 112036 1865936 0 1 0 1 1 0 1 0 0
5 1 1 27412 593336 112036 1865952 0 0 0 477 600 1346 53 7 40
4 0 0 27412 593336 112036 1865960 0 0 0 1296 731 2087 47 5 48
3 3 2 27412 594408 112052 1865972 4 0 4 2973 904 2957 32 20 48
3 1 1 26596 594544 112068 1865976 64 0 64 1433 770 2766 41 22 37
1 1 1 26596 594544 112072 1866004 0 0 5 959 702 1687 50 10 41
3 1 1 26596 594512 112072 1866024 0 0 0 1155 731 2209 52 12 37
2 0 0 26596 594512 112072 1866040 0 0 0 635 511 1293 48 5 46
0 1 1 26596 594472 112076 1866076 0 0 7 739 551 1248 49 8 43
1 0 0 26596 594472 112076 1866088 0 0 0 1048 598 1295 49 8 43
2 0 0 26596 594208 112084 1866696 0 0 203 1253 686 1506 42 16 41
1 0 0 26596 593920 112084 1866716 0 0 0 1184 599 1329 39 12 49
0 1 1 26596 593060 112084 1866740 0 0 3 1036 613 3442 48 8 44
0 1 2 26596 592920 112084 1866752 0 0 0 3825 836 1323 9 14 76
0 0 0 26596 593544 112084 1866788 0 0 0 1064 625 1197 9 15 76
0 1 1 26596 596300 112088 1866808 0 0 0 747 625 1558 7 13 79
0 0 1 26596 599600 112100 1866892 0 0 0 468 489 1331 6 4 91
0 0 0 26596 599600 112100 1866896 0 0 0 237 418 997 5 4 91
0 1 1 26596 599600 112104 1866896 0 0 0 1063 582 1371 7 7 86
0 0 0 26596 599612 112104 1866904 0 0 0 561 648 1556 6 4 89

notice all the bo as it continually writes data to disk. Also notice how
generally speaking it has no bi since it does not have to pull data up
from disk... you will notice that the couple of times it grabs
information from swap space, you'll also find a corresponding pull on
the io.

getting back to your issue in order to determine if there is a problem
in this case, you need to run explain analyze a few times repeatedly,
take the relative score given by these runs, and then come back 5-10
minutes later and run explain analyze again and see if the results are
drastically different. troll vmstat while you do this to see if there is
bi occurring. I probably should mention that just because you see
activity on bi doesn't mean that you'll notice any difference in
performance against running the query with no bi, it's dependent on a
number of factors really.

Oh, and as the other poster alluded to, knock down your shared buffers
by about 50% and see where that gets you. I might also knock *up* your
effective cache size... try doubling that and see how things go.

Hope this helps... and others jump in with corrections if needed.

Robert Treat
--
Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL

In response to

Browse pgsql-performance by date

  From Date Subject
Next Message Marty Scholes 2004-05-25 22:24:18 Re: PostgreSQL caching
Previous Message Vitaly Belman 2004-05-25 19:53:05 Re: PostgreSQL caching