Re: [HACKERS] Proposal: generic WAL compression

From: Antonin Houska <ah(at)cybertec(dot)at>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru>
Subject: Re: [HACKERS] Proposal: generic WAL compression
Date: 2017-11-16 17:31:23
Message-ID: 23990.1510853483@localhost
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Oleg Ivanov <o(dot)ivanov(at)postgrespro(dot)ru> wrote:

> In order to overcome that issue, I would like to propose the patch, which
> provides possibility to use another approach of the WAL record
> construction.

After having spent several hours reviewing this patch I dare to send the
following comments:

* writeToPtr() and readFromPtr() are applied to the existing code. I think
this is a reason for a separate diff, to be applied before the actual patch.

BTW, if you're going to do any refactoring of the existing code, I think the
"Begin" and "Start" words should be used consistently, see "fragmentBegin" vs
"validStart" in computeRegionBaseDelta().

* What's the reason for changing FRAGMENT_HEADER_SIZE ? I see no change in
the data layout that writeFragment() produces.

* alignRegions()

** typo in the prologue: "mismathing".

** "An O(ND) Difference Algorithm and Its Variations" - I think the reference
should contain more information, e.g. name of the author(s). I think I even
saw URLs to scientific papers in the PG source but I'm not 100% sure right
now.

** As for the algorithm itself: Although I didn't have enough patience (and
time) to follow it in every detail for this first review, I think I can
imagine what it is about. Yet I think reading would be easier if the
concepts of alignment and "diff delta" were depicted in the comments,
perhaps using some sort of "ASCII graphics".

** DiffDeltaOperations enumeration: the individual values should be described.

* computeRegionDiffDelta()

** Does "previous delta" in one of the comments refer to "base delta",
i.e. the delta computation w/o your patch? If so, the comment should
mention it explicitly.

** If you use Min() to initialize the for-loop, it'd be more consistent if you
also used Max() when checking the limits:

for (i = Min(validStart, targetStart); i < Max(validEnd, targetEnd); ++i)

And similarly you can simplify the body of the loop.

* computeDelta()

As the expresssion

*((bool *) pageData->delta)

is used more than once, I think a separate (bool *) variable should be used.

--
Antonin Houska
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de, http://www.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Paul Ramsey 2017-11-16 17:37:29 Re: Inlining functions with "expensive" parameters
Previous Message Robert Haas 2017-11-16 17:14:42 Re: [HACKERS] parallelize queries containing initplans