Re: Performance Improvement by reducing WAL for Update Operation

From: Amit Kapila <amit(dot)kapila(at)huawei(dot)com>
To: "'Heikki Linnakangas'" <hlinnakangas(at)vmware(dot)com>
Cc: <simon(at)2ndquadrant(dot)com>, "'Alvaro Herrera'" <alvherre(at)2ndquadrant(dot)com>, <noah(at)leadboat(dot)com>, <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>, <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2013-01-29 09:58:34
Message-ID: 008901cdfe07$3405e480$9c11ad80$@kapila@huawei.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tuesday, January 29, 2013 2:53 AM Heikki Linnakangas wrote:
> On 28.01.2013 15:39, Amit Kapila wrote:
> > Rebased the patch as per HEAD.
>
> I don't like the way heap_delta_encode has intimate knowledge of how
> the lz compression works. It feels like a violent punch through the
> abstraction layers.
>
> Ideally, you would just pass the old and new tuple to pglz as char *,
> and pglz code would find the common parts. But I guess that's too slow,
> as that's what I originally suggested and you rejected that approach.
> But even if that's not possible on performance grounds, we don't need
> to completely blow up the abstraction. pglz can still do the encoding -
> the caller just needs to pass it the attribute boundaries to consider
> for matches, so that it doesn't need to scan them byte by byte.
>
> I came up with the attached patch. I wrote it to demonstrate the API,
> I'm not 100% sure the result after decoding is correct.

I have checked the patch code, found few problems.

1. History will be old tuple, in that case below call needs to be changed
/* return pglz_compress_with_history((char *) oldtup->t_data,
oldtup->t_len,

(char *) newtup->t_data, newtup->t_len,

offsets, noffsets, (PGLZ_Header *) encdata,

&strategy);*/
return pglz_compress_with_history((char *) newtup->t_data,
newtup->t_len,

(char *) oldtup->t_data, oldtup->t_len,

offsets, noffsets, (PGLZ_Header *) encdata,

&strategy);
2. The offset array is beginning of each column offset. In that case below
needs to be changed.

offsets[noffsets++] = off;

off = att_addlength_pointer(off, thisatt->attlen, tp + off);

if (thisatt->attlen <= 0)
slow = true; /* can't use attcacheoff
anymore */

// offsets[noffsets++] = off;
}

Apart from this, some of the test cases are failing which I need to check.

I have debugged the new code, it appears to me that this will not be as
efficient as the current approach of patch.
It needs to build hash table for history reference and comparison which can
add overhead as compare to existing approach. I am taking the Performance
and WAL Reduction data.

Can there be another way with which current patch code can be made better,
so that we don't need to change the encoding approach, as I am having
feeling that this might not be performance wise equally good.

With Regards,
Amit Kapila.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2013-01-29 10:12:41 Re: [PERFORM] pgbench to the MAXINT
Previous Message Marko Tiikkaja 2013-01-29 09:31:31 Re: pg_dump --pretty-print-views