Re: Compression of full-page-writes

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Dimitri Fontaine <dimitri(at)2ndquadrant(dot)fr>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Compression of full-page-writes
Date: 2013-10-10 18:44:01
Message-ID: CAHGQGwG=aSraHUSc5dR7jywWreNoUYsgg4MD-CPDxxZ-XShF2w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Oct 11, 2013 at 1:20 AM, Dimitri Fontaine
<dimitri(at)2ndquadrant(dot)fr> wrote:
> Hi,
>
> I did a partial review of this patch, wherein I focused on the patch and
> the code itself, as I saw other contributors already did some testing on
> it, so that we know it applies cleanly and work to some good extend.

Thanks a lot!

> In full_page_writes_str() why are you returning "unrecognized" rather
> than doing an ELOG(ERROR, …) for this unexpected situation?

It's because the similar functions 'wal_level_str' and 'dbState' also return
'unrecognized' in the unexpected situation. I just implemented
full_page_writes_str()
in the same manner.

If we do an elog(ERROR) in that case, pg_xlogdump would fail to dump
the 'broken' (i.e., unrecognized fpw is set) WAL file. I think that some
users want to use pg_xlogdump to investigate the broken WAL file, so
doing an elog(ERROR) seems not good to me.

> The code switches to compression (or trying to) when the following
> condition is met:
>
> + if (fpw <= FULL_PAGE_WRITES_COMPRESS)
> + {
> + rdt->data = CompressBackupBlock(page, BLCKSZ - bkpb->hole_length, &(rdt->len));
>
> We have
>
> + typedef enum FullPageWritesLevel
> + {
> + FULL_PAGE_WRITES_OFF = 0,
> + FULL_PAGE_WRITES_COMPRESS,
> + FULL_PAGE_WRITES_ON
> + } FullPageWritesLevel;
>
> + #define FullPageWritesIsNeeded(fpw) (fpw >= FULL_PAGE_WRITES_COMPRESS)
>
> I don't much like using the <= test against and ENUM and I'm not sure I
> understand the intention you have here. It somehow looks like a typo and
> disagrees with the macro.

I thought that FPW should be compressed only when full_page_writes is
set to 'compress' or 'off'. That is, 'off' implies a compression. When it's set
to 'off', FPW is basically not generated, so there is no need to call
CompressBackupBlock() in that case. But only during online base backup,
FPW is forcibly generated even when it's set to 'off'. So I used the check
"fpw <= FULL_PAGE_WRITES_COMPRESS" there.

> What about using the FullPageWritesIsNeeded
> macro, and maybe rewriting the macro as
>
> #define FullPageWritesIsNeeded(fpw) \
> (fpw == FULL_PAGE_WRITES_COMPRESS || fpw == FULL_PAGE_WRITES_ON)

I'm OK to change the macro so that the <= test is not used.

> Also, having "on" imply "compress" is a little funny to me. Maybe we
> should just finish our testing and be happy to always compress the full
> page writes. What would the downside be exactly (on buzy IO system
> writing less data even if needing more CPU will be the right trade-off).

"on" doesn't imply "compress". When full_page_writes is set to "on",
FPW is not compressed at all.

> I like that you're checking the savings of the compressed data with
> respect to the uncompressed data and cancel the compression if there's
> no gain. I wonder if your test accounts for enough padding and headers
> though given the results we saw in other tests made in this thread.

I'm afraid that the patch has only limited effects in WAL reduction and
performance improvement unless the database contains highly-compressible
data like large blank characters column. It really depends on the contents
of the database. So, obviously FPW compression should not be the default.
Maybe we can treat it as just tuning knob.

> Why do we have both the static function full_page_writes_str() and the
> macro FullPageWritesStr, with two different implementations issuing
> either "true" and "false" or "on" and "off"?

First I was thinking to use "on" and "off" because they are often used
as the setting value of boolean GUC. But unfortunately the existing
pg_xlogdump uses "true" and "false" to show the value of full_page_writes
in WAL. To avoid breaking the backward compatibility, I implmented
the "true/false" version of function. I'm really not sure how many people
want such a compatibility of pg_xlogdump, though.

> ! unsigned hole_offset:15, /* number of bytes before "hole" */
> ! flags:2, /* state of a backup block, see below */
> ! hole_length:15; /* number of bytes in "hole" */
>
> I don't understand that. I wanted to use that patch as a leverage to
> smoothly discover the internals of our WAL system but won't have the
> time to do that here.

We need the flag indicating whether each FPW is compressed or not.
If no such a flag exists in WAL, the standby cannot determine whether
it should decompress each FPW or not, and then cannot replay
the WAL containing FPW properly. That is, I just used a 'space' in
the header of FPW to have such a flag.

> That said, I don't even know that C syntax.

The struct 'ItemIdData' uses the same C syntax.

> + #define BKPBLOCK_UNCOMPRESSED 0 /* uncompressed */
> + #define BKPBLOCK_COMPRESSED 1 /* comperssed */
>
> There's a typo in the comment above.

Yep.

>> [time required to replay WAL generated during running pgbench]
>> 61s (on) .... 1209911 transactions were replayed,
>> recovery speed: 19834.6 transactions/sec
>> 39s (compress) .... 1445446 transactions were replayed,
>> recovery speed: 37062.7 transactions/sec
>> 37s (off) .... 1629235 transactions were replayed,
>> recovery speed: 44033.3 transactions/sec
>
> How did you get those numbers ? pg_basebackup before the test and
> archiving, then a PITR maybe? Is it possible to do the same test with
> the same number of transactions to replay, I guess using the -t
> parameter rather than the -T one for this testing.

Sure. To be honest, when I received the same request from Andres,
I did that benchmark. But unfortunately because of machine trouble,
I could not report it, yet. Will do that again.

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2013-10-10 18:44:04 Re: Auto-tuning work_mem and maintenance_work_mem
Previous Message Robert Haas 2013-10-10 18:43:22 Re: Auto-tuning work_mem and maintenance_work_mem