Re: Performance Improvement by reducing WAL for Update Operation

From: Simon Riggs <simon(at)2ndQuadrant(dot)com>
To: Kyotaro HORIGUCHI <horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: amit(dot)kapila(at)huawei(dot)com, hlinnakangas(at)vmware(dot)com, noah(at)leadboat(dot)com, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Performance Improvement by reducing WAL for Update Operation
Date: 2012-12-28 10:21:38
Message-ID: CA+U5nMLSrKZrpQVN-QPDEmtC1tNDSqWsUqobcyNHo+0aDGntig@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 28 December 2012 08:07, Kyotaro HORIGUCHI
<horiguchi(dot)kyotaro(at)lab(dot)ntt(dot)co(dot)jp> wrote:

> Hello, I saw this patch and confirmed that
>
> - Coding style looks good.
> - Appliable onto HEAD.
> - Some mis-codings are fixed.

I've had a quick review of the patch to see how close we're getting.
The perf tests look to me like we're getting what we wanted from this
and I'm happy with the recovery performance trade-offs. Well done to
both author and testers.

My comments

* There is a fixed 75% heuristic in the patch. Can we document where
that came from? Can we have a parameter that sets that please? This
can be used to have further tests to confirm the useful setting of
this. I expect it to be removed before we release, but it will help
during beta.

* The compression algorithm depends completely upon new row length
savings. If the new row is short, it would seem easier to just skip
the checks and include it anyway. We can say if old and new vary in
length by > 50% of each other, just include new as-is, since the rows
very clearly differ in a big way. Also, if tuple is same length as
before, can we compare the whole tuple at once to save doing
per-column checks?

* If full page writes is on and the page is very old, we are just
going to copy the whole block. So why not check for that rather than
do all these push ups and then just copy the page anyway?

* TOAST is not handled at all. No comments about it, nothing. Does
that mean it hasn't been considered? Or did we decide not to care in
this release? Presumably that means we are comparing toast pointers
byte by byte to see if they are the same?

* I'd like to see a specific test in regression that is designed to
exercise the code here. That way we will be certain that the code is
getting regularly tested.

* The internal docs are completely absent. We need at least a whole
page of descriptive comment, discussing trade-offs and design
decisions. This is very important because it will help locate bugs
much faster if these things are clealry documented. It also helps
reviewers. This is a big timewaster for committers because you have to
read the whole patch and understand it before you can attempt to form
opinions. Commits happen quicker and easier with good comments.

* Lots of typos in comments. Many comments say nothing more than the
words already used in the function name itself

* "flags" variables are almost always int or uint in PG source.

* PGLZ_HISTORY_SIZE needs to be documented in the place it is defined,
not the place its used. The test if (oldtuplen < PGLZ_HISTORY_SIZE)
really needs to be a test inside the compression module to maintain
better modularity, so the value itself needn't be exported

* Need to mention the WAL format change, or include the change within
the patch so we can see

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2012-12-28 11:27:23 Re: Performance Improvement by reducing WAL for Update Operation
Previous Message Dimitri Fontaine 2012-12-28 10:04:29 Re: Proposal: Store "timestamptz" of database creation on "pg_database"