Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility

From: Chapman Flack <chap(at)anastigmatix(dot)net>
To: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Bruce Momjian <bruce(at)momjian(dot)us>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] AdvanceXLInsertBuffer vs. WAL segment compressibility
Date: 2018-02-25 17:22:08
Message-ID: 5A92F0C0.7070706@anastigmatix.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 07/17/17 11:29, Michael Paquier wrote:
> On Thu, Jul 6, 2017 at 3:48 PM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
>> On 07/03/2017 06:30 PM, Chapman Flack wrote:
>>> Although it's moot in the straightforward approach of re-zeroing in
>>> the loop, it would still help my understanding of the system to know
>>> if there is some subtlety that would have broken what I proposed
>>> earlier, which was an extra flag to AdvanceXLInsertBuffer() that
>>> ...
>>
>> Yeah, I suppose that would work, too.
>
> FWIW, I would rather see any optimization done in
> AdvanceXLInsertBuffer() instead of seeing a second memset re-zeroing
> the WAL page header after its data has been initialized by
> AdvanceXLInsertBuffer() once.

Here is a patch implementing the simpler approach Heikki suggested
(though my original leaning had been to wrench on AdvanceXLInsertBuffer
as Michael suggests). The sheer simplicity of the smaller change
eventually won me over, unless there's a strong objection.

As noted before, the cost of the extra small MemSet is proportional
to the number of unused pages in the segment, and that is an indication
of how busy the system *isn't*. I don't have a time benchmark of the
patch's impact; if I should, what would be a good methodology?

Before the change, what some common compression tools can achieve on
a mostly empty (select pg_switch_wal()) segment on my hardware are:

gzip 27052 in 0.145s
xz 5852 in 0.678s
lzip 5747 in 1.254s
bzip2 1445 in 0.261s

bzip2 is already the clear leader (I don't have lz4 handy to include in
the comparison) at around 1/20th size gzip can achieve, and that's before
this patch. After:

gzip 16393 in 0.143s
xz 2640 in 0.520s
lzip 2535 in 1.198s
bzip2 147 in 0.238s

The patch gives gzip almost an extra factor of two, and about the same
for xz and lzip, and bzip2 gains nearly another order of magnitude.

I considered adding a regression test for unfilled-segment compressibility,
but wasn't sure where it would most appropriately go. I assume a TAP test
would be the way to do it.

-Chap

Attachment Content-Type Size
0001-Zero-headers-of-unused-pages-after-WAL-switch.patch text/x-patch 1.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Lætitia Avrot 2018-02-25 17:51:33 VACUUM FULL name is very confusing to some people (or to most non expert people)
Previous Message Amirouche Boubekki 2018-02-25 17:14:30 neon: a functional database, git for structured data