Re: Huge Data sets, simple queries

From: "Jeffrey W(dot) Baker" <jwbaker(at)acm(dot)org>
To: Luke Lonergan <llonergan(at)greenplum(dot)com>
Cc: pgsql-performance(at)postgresql(dot)org
Subject: Re: Huge Data sets, simple queries
Date: 2006-02-01 04:09:40
Message-ID: 1138766980.14051.24.camel@noodles
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-performance

On Tue, 2006-01-31 at 12:47 -0800, Luke Lonergan wrote:
> Jeffrey,
>
> On 1/31/06 12:03 PM, "Jeffrey W. Baker" <jwbaker(at)acm(dot)org> wrote:
> > Linux does balanced reads on software
> > mirrors. I'm not sure why you think this can't improve bandwidth. It
> > does improve streaming bandwidth as long as the platter STR is more than
> > the bus STR.
>
> ... Prove it.

It's clear that Linux software RAID1, and by extension RAID10, does
balanced reads, and that these balanced reads double the bandwidth. A
quick glance at the kernel source code, and a trivial test, proves the
point.

In this test, sdf and sdg are Seagate 15k.3 disks on a single channel of
an Adaptec 39320, but the enclosure, and therefore the bus, is capable
of only Ultra160 operation.

# grep md0 /proc/mdstat
md0 : active raid1 sdf1[0] sdg1[1]

# dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=0 &
dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=400000
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 48.243362 seconds (67922298 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 48.375897 seconds (67736211 bytes/sec)

That's 136MB/sec, for those following along at home. With only two
disks in a RAID1, you can nearly max out the SCSI bus.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=400000 skip=0 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=400000 skip=400000
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 190.413286 seconds (17208883 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 192.096232 seconds (17058117 bytes/sec)

That, on the other hand, is only 34MB/sec. With two threads, the RAID1
is 296% faster.

# dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=0 &
dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=400000 &
dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=800000 &
dd if=/dev/md0 of=/dev/null bs=8k count=400000 skip=1200000 &
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 174.276585 seconds (18802296 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 181.581893 seconds (18045852 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 183.724243 seconds (17835425 bytes/sec)
400000+0 records in
400000+0 records out
3276800000 bytes transferred in 184.209018 seconds (17788489 bytes/sec)

That's 71MB/sec with 4 threads...

# dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=0 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=400000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=800000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=100000 skip=1200000 &
100000+0 records in
100000+0 records out
819200000 bytes transferred in 77.489210 seconds (10571794 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 87.628000 seconds (9348610 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 88.912989 seconds (9213502 bytes/sec)
100000+0 records in
100000+0 records out
819200000 bytes transferred in 90.238705 seconds (9078144 bytes/sec)

Only 36MB/sec for the single disk. 96% advantage for the RAID1.

# dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=0 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=400000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=800000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=1200000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=1600000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2000000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2400000 &
dd if=/dev/md0 of=/dev/null bs=8k count=50000 skip=2800000 &
50000+0 records in
50000+0 records out
409600000 bytes transferred in 35.289648 seconds (11606803 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 42.653475 seconds (9602969 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 43.524714 seconds (9410745 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 45.151705 seconds (9071640 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 47.741845 seconds (8579476 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 48.600533 seconds (8427891 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 48.758726 seconds (8400548 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 49.679275 seconds (8244887 bytes/sec)

66MB/s with 8 threads.

# dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=0 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=400000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=800000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=1200000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=1600000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2000000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2400000 &
dd if=/dev/sdf1 of=/dev/null bs=8k count=50000 skip=2800000 &
50000+0 records in
50000+0 records out
409600000 bytes transferred in 73.873911 seconds (5544583 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 75.613093 seconds (5417051 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 79.988303 seconds (5120749 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 79.996440 seconds (5120228 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 84.885172 seconds (4825342 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 92.995892 seconds (4404496 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 99.180337 seconds (4129851 bytes/sec)
50000+0 records in
50000+0 records out
409600000 bytes transferred in 100.144752 seconds (4090080 bytes/sec)

33MB/s. RAID1 gives a 100% advantage at 8 threads.

I think I've proved my point. Software RAID1 read balancing provides
0%, 300%, 100%, and 100% speedup on 1, 2, 4, and 8 threads,
respectively. In the presence of random I/O, the results are even
better.

Anyone who thinks they have a single-threaded workload has not yet
encountered the autovacuum daemon.

-Jeff

In response to

Responses

Browse pgsql-performance by date

  From Date Subject
Next Message James Russell 2006-02-01 04:33:08 Re: Sequential scan being used despite indexes
Previous Message Michael Fuhr 2006-02-01 03:58:04 Re: Sequential scan being used despite indexes