Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: <pgsql-hackers(at)postgresql(dot)org>, <noah(at)leadboat(dot)com>
Subject: Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation
Date: 2012-09-27 13:08:39
Message-ID: 00aa01cd9cb1$36c29a40$a447cec0$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

> On Thursday, September 27, 2012 4:12 PM Heikki Linnakangas wrote:
> On 25.09.2012 18:27, Amit Kapila wrote:
> > If you feel it is must to do the comparison, we can do it in same way
> > as we identify for HOT?
>
> Yeah. (But as discussed, I think it would be even better to just treat
> the old and new tuple as an opaque chunk of bytes, and run them through
> a generic delta algorithm).
>

Thank you for the modified patch.

> The conclusion is that there isn't very much difference among the
> patches. They all squeeze the WAL to about the same size, and the
> increase in TPS is roughly the same.
>
> I think more performance testing is required. The modified pgbench test
> isn't necessarily very representative of a real-life application. The
> gain (or loss) of this patch is going to depend a lot on how many
> columns are updated, and in what ways. Need to test more scenarios,
> with many different database schemas.
>
> The LZ approach has the advantage that it can take advantage of all
> kinds of similarities between old and new tuple. For example, if you
> swap the values of two columns, LZ will encode that efficiently. Or if
> you insert a character in the middle of a long string. On the flipside,
> it's probably more expensive. Then again, you have to do a memcmp() to
> detect which columns have changed with your approach, and that's not
> free either. That was not yet included in the patch version I tested.
> Another consideration is that when you compress the record more, you
> have less data to calculate CRC for. CRC calculation tends to be quite
> expensive, so even quite aggressive compression might be a win. Yet
> another consideration is that the compression/encoding is done while
> holding a lock on the buffer. For the sake of concurrency, you want to
> keep the duration the lock is held as short as possible.

Now I shall do the various tests for following and post it here:
a. Attached Patch in the mode where it takes advantage of history tuple
b. By changing the logic for modified column calculation to use calculation
for memcmp()

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-09-27 13:22:40 Re: data to json enhancements
Previous Message Heikki Linnakangas 2012-09-27 10:41:46 Re: Re: [WIP] Performance Improvement by reducing WAL for Update Operation