Re: [PATCHES] WAL Performance Improvements

From: Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg>
To: Helge Bahmann <bahmann(at)math(dot)tu-freiberg(dot)de>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, janareddy <jana-reddy(at)mediaring(dot)com(dot)sg>
Subject: Re: [PATCHES] WAL Performance Improvements
Date: 2002-02-26 12:33:43
Message-ID: 3C7B80A7.B9CF859C@mediaring.com.sg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Helge Bahmann wrote:

> On Tue, 26 Feb 2002, Janardhana Reddy wrote:
> > Test Results with Latest patch :
> > environment: Intel PC ,IDE (harddisk),Linux Kernel 2.4.0 (OS
> > Version). Single
> > connection is connected to the database and pumping
> > continously insert statements. each insert
> > generates 160 bytes to WAL Log.
>
> 8192:
> > Transaction Per Second : 332 TPS
> > Time Taken by fdatasync : 2160
>
> 4096:
> > Transaction Per Second : 435 TPS
> > Time Taken by fdatasync : 512
>
> Unforunately your timings are meaningless. Assuming you have a
> 10000rpm drive (that is, 166 rounds per second), it is physically
> impossible to write 332 or 435 times per second to the same location
> on the disk.
>
> So I guess your disk is performing write-caching and not really writing
> the data back when requested by fsync(). You may try to disable
> write caching and see if it makes a difference:
>
> hdparm -W 0 /dev/hda
>
> But note that most (or even all) modern IDE drives will not disable write
> caching even when instructed to do so. You should try to repeat the timings
> using SCSI drives -- I guess you will not see any improvement here.
>
> Regards
> --
> Helge Bahmann <bahmann(at)math(dot)tu-freiberg(dot)de> /| \__
> Network admin, systems programmer /_|____\
> _/\ | __)
> $ ./configure \\ \|__/__|
> checking whether build environment is sane... yes \\/___/ |
> checking for AIX... no (we already did this) |

i have tested again but it gives the same result .
now i have tested with small program justing doing write and fdatasync by
changing
the size of data in write call. there is big difference in fdatasync time:
The test program looks as below:
------------------------------------
#include <stdio.h>
#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
main()
{
int fd;
int i;
int data_size;
char buf[20000];
fd=open("testdata",O_CREAT|O_WRONLY);
data_size=8*1024 ;
while (1)
{
i++;
lseek(fd,0,SEEK_SET);
write(fd,buf,data_size);
fdatasync(fd);
if (i%10000 ==0)
{
printf("---------------\nindex: %d size: %d :\n ",i,data_size);
system("date");
}
}
}
=========================================================
Test1 : with test program data_size= 8*1024
the output looks as below:
./a.out
---------------
index: 134520000 size: 8192 :
Tue Feb 26 19:46:12 SGT 2002
---------------
index: 134530000 size: 8192 :
Tue Feb 26 19:46:51 SGT 2002
---------------
index: 134540000 size: 8192 :
Tue Feb 26 19:47:27 SGT 2002
---------------
index: 134550000 size: 8192 :
Tue Feb 26 19:48:04 SGT 2002
---------------
strace output:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
98.36 29.354861 3141 9347 fdatasync
1.45 0.432686 46 9350 write
0.15 0.044835 5 9348 lseek
0.03 0.009375 9375 1 wait4
0.00 0.001114 1114 1 vfork
0.00 0.000140 35 4 rt_sigaction
0.00 0.000007 4 2 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 29.843018 28053 total
======================================================================
Test2 : with test program , data_size=160
the output looks as below:
---------------
index: 134520000 size: 160 :
Tue Feb 26 19:44:41 SGT 2002
---------------
index: 134530000 size: 160 :
Tue Feb 26 19:44:44 SGT 2002
---------------
index: 134540000 size: 160 :
Tue Feb 26 19:44:48 SGT 2002
---------------
index: 134550000 size: 160 :
Tue Feb 26 19:44:52 SGT 2002
---------------
index: 134560000 size: 160 :
Tue Feb 26 19:44:56 SGT 2002

strace output :
strace -c -p 4741
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
95.54 5.672195 396 14328 fdatasync
3.12 0.185227 13 14330 write
1.16 0.069158 5 14329 lseek
0.12 0.006927 6927 1 vfork
0.05 0.003146 3146 1 wait4
0.00 0.000020 5 4 rt_sigaction
0.00 0.000007 4 2 rt_sigprocmask
------ ----------- ----------- --------- --------- ----------------
100.00 5.936680 42995 total
======================================================================================

SUMMARY :

Test1: (data_size= 8192 , with test program)
fdatasync time +write time: 3141+46 = 3187 usec/call
Time taken for 10000 iterations: nearly 40 seconds
Test2 : (data_size = 160, with test program)
fdatasync time+write time: 396 +13 = 409 usec/call
Time taken for 10000 iterations: nealy 4 seconds

When i test with database by doing 10000 inserts which generates 160 bytes
into WAL Log :
Test3: (without apllying the patch, with existing database)
10000 insert = 30 seconds
fdatasync time= 2160 usec
Test4 : (with patch apllied, with database)
10000 inserts = 23 seconds
fdatasync time= 512 usec

what i don't understand is, in the test3 with extisting postgres database
it takes
the fdatasync time 2160 usec. but according to Test1 it takes 3141 usec
eventhough both the
data size it write is 8192. This cause the difference in the results .
The hard disk is not doing any write caching. While doing the fdatasync the
linux OS
writies only the dirty buffers (size=512 bytes) so it causes big difference in
fdatsync from
396 usec to 3141 usec .

Regards
jana

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Adam 2002-02-26 12:45:07 Re: setting up a trace through extended stored procedures
Previous Message Tatsuo Ishii 2002-02-26 12:16:54 Re: COPY FROM is not 8bit clean

Browse pgsql-patches by date

  From Date Subject
Next Message Thomas Lockhart 2002-02-26 13:27:09 Re: minor doc patch for example in 'SET' docs
Previous Message Helge Bahmann 2002-02-26 10:58:55 Re: WAL Performance Improvements