Quick Links

Re: [WIP] Performance Improvement by reducing WAL for Update Operation

From:	Noah Misch <noah(at)leadboat(dot)com>
To:	Amit kapila <amit(dot)kapila(at)huawei(dot)com>
Cc:	"hlinnakangas(at)vmware(dot)com" <hlinnakangas(at)vmware(dot)com>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date:	2012-10-24 15:27:48
Message-ID:	20121024152748.GC22334@tornado.leadboat.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Wed, Oct 24, 2012 at 05:55:56AM +0000, Amit kapila wrote:
> Wednesday, October 24, 2012 5:51 AM Noah Misch wrote:
> > Stepping back a moment, I would expect this patch to change performance in at
> > least four ways (Heikki largely covered this upthread):
>
> > a) High-concurrency workloads will improve thanks to reduced WAL insert
> > contention.
> > b) All workloads will degrade due to the CPU cost of identifying and
> > implementing the optimization.
> > c) Workloads starved for bulk WAL I/O will improve due to reduced WAL volume.
> > d) Workloads composed primarily of long transactions with high WAL volume will
> > improve due to having fewer end-of-WAL-segment fsync requests.
>
> All your points are very good summarization of work, but I think one point can be added :
> e) Reduced the cost of doing crc and copying less data in Xlog buffer in XLogInsert() due to reduced size of xlog record.

True.

> > Your benchmark numbers show small gains and losses for single-client
> > workloads, moving to moderate gains for 2-client workloads. This suggests
> > strong influence from (a), some influence from (b), and little influence from
> > (c) and (d). Actually, the response to scale evident in your numbers seems
> > too good to be true; why would (a) have such a large effect over the
> > transition from one client to two clients?
>
> I think if we just see from the point of LZ compression, there are predominently 2 things, your point (b) and point (e) mentioned by me.
> For single threads, the cost of doing compression supercedes the cost of crc and other improvement in xloginsert().
> However when come to multi threads, the cost reduction due to point (e) will reduce the time under lock and hence we see such a effect from
> 1 client to 2 clients.

Note that the CRC calculation over variable-size data in the WAL record
happens before taking WALInsertLock.

> > Also, for whatever reason, all
> > your numbers show fairly bad scaling. With the XLOG scale and LZ patches,
> > synchronous_commit=off, -F 80, and rec length 250, 8-client average
> > performance is only 2x that of 1-client average performance.

Correction: with the XLOG scale patch only, your benchmark runs show 8-client
average performance as 2x that of 1-client average performance. With both the
XLOG scale and LZ patches, it grows to almost 4x. However, both ought to be
closer to 8x.

> > -Patch- -tps(at)-c1- -tps(at)-c2- -tps(at)-c8- -WAL(at)-c8-
> > HEAD,-F80 816 1644 6528 1821 MiB
> > xlogscale,-F80 824 1643 6551 1826 MiB
> > xlogscale+lz,-F80 717 1466 5924 1137 MiB
> > xlogscale+lz,-F100 753 1508 5948 1548 MiB
>
> > Those are short runs with no averaging of multiple iterations; don't put too
> > much faith in the absolute numbers. Still, I consistently get linear scaling
> > from 1 client to 8 clients. Why might your results have been so different in
> > this regard?
>
> 1. The only reason for you seeing the difference of linear scalability can be because of the numbers I have posted for 8 threads is
> of run with -c16 -j8. I shall run with -c8 and post the performance numbers. I am hoping it should match the way you see the numbers

I doubt that. Your 2-client numbers also show scaling well-below linear.
With 8 cores, 16-client performance should not fall off compared to 8 clients.

Perhaps 2 clients saturate your I/O under this workload, but 1 client does
not. Granted, that theory doesn't explain all your numbers, such as the
improvement for record length 50 @ -c1.

> 2. Now, if we see that in the results you have posted,
> a) there is not much performance difference between head and xlog scale

Note that the xlog scale patch addresses a different workload:
http://archives.postgresql.org/message-id/505B3648.1040801@vmware.com

> b) with LZ patch it shows there is decrease in performance
> I think this can be because it has ran for very less time as you have also mentioned.

Yes, that's possible.

> > It's also odd that your -F100 numbers tend to follow your -F80 numbers despite
> > the optimization kicking in far more frequently for the latter.
>
> The results with avg of 3 - 15mins runs for LZ patch are:
> -Patch- -tps(at)-c1- -tps(at)-c2- -tps(at)-c16-j8
> xlogscale+lz,-F80 663 1232 2498
> xlogscale+lz,-F100 660 1221 2361
>
> The result is showing that avg. tps is better with -F80 which is I think what is expected.

Yes. Let me elaborate on the point I hoped to make. Based on my test above,
-F80 more than doubles the bulk WAL savings compared to -F100. Your benchmark
runs showed a 61.8% performance improvement at -F100 and a 62.5% performance
improvement at -F80. If shrinking WAL increases performance, shrinking it
more should increase performance more. Instead, you observed similar levels
of improvement at both fill factors. Why?

> So to conclude, according to me, following needs to be done.
>
> 1. to check the major discrepency of data about linear scaling, I shall take the data with -c8 configuration rather than with -c16 -j8.

With unpatched HEAD, synchronous_commit=off, and sufficient I/O bandwidth, you
should be able to get pgbench to scale linearly to 8 clients. You can then
benchmark for effects (a), (b) and (e). With insufficient I/O bandwidth,
you're benchmarking (c) and (d). (And/or other effects I haven't considered.)

> 2. to conclude whether LZ patch, gives better performance, I think it needs to be run for longer time.

Agreed.

> Please let me know what is you opinion for above, do we need to do anything more than what is mentioned?

I think the next step is to figure out what limits your scaling. Then we can
form a theory about the meaning of your benchmark numbers.

In response to

Re: [WIP] Performance Improvement by reducing WAL for Update Operation at 2012-10-24 05:55:56 from Amit kapila

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tom Lane	2012-10-24 16:06:35	Re: [PATCH] Support for Array ELEMENT Foreign Keys
Previous Message	Noah Misch	2012-10-24 15:14:44	Re: [PATCH] Support for Array ELEMENT Foreign Keys