Re: [PATCHES] WAL Performance Improvements

From: Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-patches <pgsql-patches(at)postgresql(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, janareddy <jana-reddy(at)mediaring(dot)com(dot)sg>
Subject: Re: [PATCHES] WAL Performance Improvements
Date: 2002-02-25 07:11:32
Message-ID: 3C79E3A4.77005625@mediaring.com.sg
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Tom Lane wrote:

> Janardhana Reddy <jana-reddy(at)mediaring(dot)com(dot)sg> writes:
> > I've attached a patch which should improve the performance of WAL by
> > reducing the fsync time
> > and write time by 50%(if OS page size is 4k) , if the transaction
> > generate the WAL data less then 4k. Instead of
> > writing every time 8k data in to the WAL file it will write only the
> > portion of the data which
> > as changed from the last time(Example : if transaction generates 150
> > bytes of WAL data ,then it writes
> > only 150 bytes instead of 8k).
>
> As near as I can tell, this breaks WAL by failing to ensure that the
> rest of the current page is zeroed. After crash and recovery, you might
> read obsolete WAL records (written during the previous cycle of life
> of the WAL segment file) and think they are valid.
>
> I'd also be interested to see the measurements backing up the claim of 50%
> performance improvement. That'd depend very largely on the filesystem block
> size, no?
>
> regards, tom lane

correct, this breaks WAL by failing to ensure that the rest of the
current page is zeroed when the WAL file is reused. I am thinking to
fix this by writing an extra WAL record(few bytes which are zeroed ) more when
there is write
and size of data is less then BLKSIZE, this should fix the problem.

I think performance improvement depends on the OS page size , since OS
looks which page is dirty and writes entire page
at the the of sync even if few bytes of the page are modified. I think
for linux it is 4k. The measurement of the test on Linux is as follows:

This is output of "strace -c" of the backend before the patch is applied:
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
87.75 0.462903 2269 204 fdatasync
6.13 0.032322 158 204 send
2.91 0.015330 75 204 recv
2.55 0.013477 63 214 write
0.23 0.001226 6 210 lseek
0.21 0.001089 5 204 time
0.15 0.000765 4 204 gettimeofday
0.07 0.000362 91 4 read
0.01 0.000035 35 1 open
------ ----------- ----------- --------- --------- ----------------
100.00 0.527509 1449 total

This ouput is after the patch is applied
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
47.92 0.101630 498 204 fdatasync
47.14 0.099969 490 204 recv
2.30 0.004879 23 215 write
1.57 0.003340 16 204 send
0.51 0.001084 5 204 time
0.38 0.000809 4 204 gettimeofday
0.13 0.000269 67 4 read
0.02 0.000046 7 7 lseek
0.02 0.000041 41 1 open
------ ----------- ----------- --------- --------- ----------------
100.00 0.212067 1247 total

The main improvement comes from fdatasync from 2269 usec to 498 usec.
but i expect
the fdatasync time to reduce by 50% (since the linux OS 2.4 uses 4K page
size) but all the tests show the reduction by 75%. In all the tests
each transaction generates/writes 150 bytes in to the WAL file.

regards
jana

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephan Szabo 2002-02-25 07:41:01 Re: 7.2 stuff
Previous Message Tom Lane 2002-02-25 05:38:29 Re: [HACKERS] Updated TODO item

Browse pgsql-patches by date

  From Date Subject
Next Message Bernhard Herzog 2002-02-25 11:15:26 Re: Fix for non-blocking connections in libpq
Previous Message Paul Eggert 2002-02-25 07:07:01 support for POSIX 1003.1-2001 hosts