Re: slow i/o

From: "Junaili Lie" <junaili(at)gmail(dot)com>
To: "Jignesh K(dot) Shah" <J(dot)K(dot)Shah(at)sun(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: slow i/o
Date: 2006-09-26 23:27:41
Message-ID: 8d04ce990609261627t7321d3d4v7b8b4715e24b77a5@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

Hi all,
I am still encountering this issue.
I am doing further troubleshooting.
Here is what I found:
When I do: dtrace -s /usr/demo/dtrace/whoio.d
I found that there's one process that is doing majority of i/o, but that
process is not listed on pg_stat_activity.
I am also seeing more of this type of query being slow:
EXECUTE <unnamed> [PREPARE: ...
I am also seeing some article recommending adding some entries on
/etc/system:
segmapsize=2684354560 set ufs:freebehind=0
I haven't tried this, I am wondering if this will help.

Also, here is the output of iostat -xcznmP 1 at approx time during the i/o
spike:
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
4.0 213.0 32.0 2089.9 0.0 17.0 0.0 78.5 0 61 c1t0d0s6 (/usr)
cpu
us sy wt id
54 6 0 40
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.9 0.0 0.0 0 90 c1t0d0s1 (/var)
2.0 335.0 16.0 3341.6 0.2 73.3 0.6 217.4 4 100 c1t0d0s6 (/usr)
cpu
us sy wt id
30 4 0 66
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 1.0 0.0 4.0 0.0 0.1 0.0 102.0 0 10 c1t0d0s1 (/var)
1.0 267.0 8.0 2729.1 0.0 117.8 0.0 439.5 0 100 c1t0d0s6
(/usr)
cpu
us sy wt id
28 8 0 64
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 270.0 8.0 2589.0 0.0 62.0 0.0 228.7 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
26 2 0 72
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 269.0 16.0 2971.5 0.0 66.6 0.0 245.7 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
8 7 0 86
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 268.0 8.0 2343.5 0.0 110.3 0.0 410.2 0 100 c1t0d0s6
(/usr)
cpu
us sy wt id
4 4 0 92
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 260.0 0.0 2494.5 0.0 63.5 0.0 244.2 0 100 c1t0d0s6
(/usr)
cpu
us sy wt id
24 3 0 74
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 286.0 8.0 2519.1 35.4 196.5 123.3 684.7 49 100 c1t0d0s6
(/usr)
cpu
us sy wt id
65 4 0 30
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 316.0 16.0 2913.8 0.0 117.2 0.0 368.7 0 100 c1t0d0s6
(/usr)
cpu
us sy wt id
84 7 0 9
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
5.0 263.0 40.0 2406.1 0.0 55.8 0.0 208.1 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
77 4 0 20
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
4.0 286.0 32.0 2750.6 0.0 75.0 0.0 258.5 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
21 3 0 77
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 273.0 16.0 2516.4 0.0 90.8 0.0 330.0 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
15 6 0 78
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
2.0 280.0 16.0 2711.6 0.0 65.6 0.0 232.6 0 100 c1t0d0s6 (/usr)
cpu
us sy wt id
6 3 0 92
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 308.0 8.0 2661.5 61.0 220.2 197.4 712.7 67 100 c1t0d0s6
(/usr)
cpu
us sy wt id
7 4 0 90
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
1.0 268.0 8.0 2839.9 0.0 97.1 0.0 360.9 0 100 c1t0d0s6 (/usr)

cpu
us sy wt id
11 10 0 80
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 309.0 0.0 3333.5 175.2 208.9 566.9 676.2 81 99 c1t0d0s6
(/usr)
cpu
us sy wt id
0 0 0 100
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 330.0 0.0 2704.0 145.6 256.0 441.1 775.7 100 100 c1t0d0s6
(/usr)
cpu
us sy wt id
4 2 0 94
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 311.0 0.0 2543.9 151.0 256.0 485.6 823.2 100 100 c1t0d0s6
(/usr)
cpu
us sy wt id
2 0 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 319.0 0.0 2576.0 147.4 256.0 462.0 802.5 100 100 c1t0d0s6
(/usr)
cpu
us sy wt id
0 1 0 98
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 0.0 0.0 0.0 0.0 0.2 0.0 0.0 2 13 c1t0d0s1 (/var)
0.0 366.0 0.0 3088.0 124.4 255.8 339.9 698.8 100 100 c1t0d0s6
(/usr)
cpu
us sy wt id
6 5 0 90
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 2.0 0.0 16.0 0.0 1.1 0.0 533.2 0 54 c1t0d0s1 (/var)
1.0 282.0 8.0 2849.0 1.5 129.2 5.2 456.5 10 100 c1t0d0s6
(/usr)

Thank you in advance for your help!

Jun

On 8/30/06, Junaili Lie <junaili(at)gmail(dot)com> wrote:
>
> I have tried this to no avail.
> I have also tried changing the bg_writer_delay parameter to 10. The spike
> in i/o still occurs although not in a consistent basis and it is only
> happening for a few seconds.
>
>
>
>
> On 8/30/06, Jignesh K. Shah <J(dot)K(dot)Shah(at)sun(dot)com> wrote:
> >
> > The bgwriter parameters changed in 8.1
> >
> > Try
> >
> > bgwriter_lru_maxpages=0
> > bgwriter_lru_percent=0
> >
> > to turn off bgwriter and see if there is any change.
> >
> > -Jignesh
> >
> >
> > Junaili Lie wrote:
> > > Hi Jignesh,
> > > Thank you for my reply.
> > > I have the setting just like what you described:
> > >
> > > wal_sync_method = fsync
> > > wal_buffers = 128
> > > checkpoint_segments = 128
> > > bgwriter_all_percent = 0
> > > bgwriter_maxpages = 0
> > >
> > >
> > > I ran the dtrace script and found the following:
> > > During the i/o busy time, there are postgres processes that has very
> > > high BYTES count. During that non i/o busy time, this same process
> > > doesn't do a lot of i/o activity. I checked the pg_stat_activity but
> > > couldn't found this process. Doing ps revealed that this process is
> > > started at the same time since the postgres started, which leads me to
> > > believe that it maybe background writer or some other internal
> > process.
> > > This process are not autovacuum because it doesn't disappear when I
> > > tried turning autovacuum off.
> > > Except for the ones mentioned above, I didn't modify the other
> > > background setting:
> > > MONSOON=# show bgwriter_delay ;
> > > bgwriter_delay
> > > ----------------
> > > 200
> > > (1 row)
> > >
> > > MONSOON=# show bgwriter_lru_maxpages ; bgwriter_lru_maxpages
> > > -----------------------
> > > 5
> > > (1 row)
> > >
> > > MONSOON=# show bgwriter_lru_percent ;
> > > bgwriter_lru_percent
> > > ----------------------
> > > 1
> > > (1 row)
> > >
> > > This i/o spike only happens at minute 1 and minute 6 (ie. 10.51, 10.56)
> > > . If I do select * from pg_stat_activity during this time, I will see
> > a
> > > lot of write queries waiting to be processed. After a few seconds,
> > > everything seems to be gone. All writes that are not happening at the
> > > time of this i/o jump are being processed very fast, thus do not show
> > on
> > > pg_stat_activity.
> > >
> > > Thanks in advance for the reply,
> > > Best,
> > >
> > > J
> > >
> > > On 8/29/06, *Jignesh K. Shah* < J(dot)K(dot)Shah(at)sun(dot)com
> > > <mailto:J(dot)K(dot)Shah(at)sun(dot)com>> wrote:
> > >
> > > Also to answer your real question:
> > >
> > > DTrace On Solaris 10:
> > >
> > > # dtrace -s /usr/demo/dtrace/whoio.d
> > >
> > > It will tell you the pids doing the io activity and on which
> > devices.
> > > There are more scripts in that directory like iosnoop.d, iotime.d
> > > and others which also will give
> > > other details like file accessed, time it took for the io etc.
> > >
> > > Hope this helps.
> > >
> > > Regards,
> > > Jignesh
> > >
> > >
> > > Junaili Lie wrote:
> > > > Hi everyone,
> > > > We have a postgresql 8.1 installed on Solaris 10. It is running
> > fine.
> > > > However, for the past couple days, we have seen the i/o reports
> >
> > > > indicating that the i/o is busy most of the time. Before this,
> > we
> > > only
> > > > saw i/o being busy occasionally (very rare). So far, there has
> > > been no
> > > > performance complaints by customers, and the slow query reports
> >
> > > doesn't
> > > > indicate anything out of the ordinary.
> > > > There's no code changes on the applications layer and no
> > database
> > > > configuration changes.
> > > > I am wondering if there's a tool out there on Solaris to tell
> > which
> > > > process is doing most of the i/o activity?
> > > > Thank you in advance.
> > > >
> > > > J
> > > >
> > >
> > >
> >
>
>

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message Jim Nasby 2006-09-27 03:08:35 Re: Confusion and Questions about blocks read
Previous Message Markus Schaber 2006-09-26 22:39:09 Re: Decreasing BLKSZ