Re: WAL format and API changes (9.5)

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Amit Kapila <amit(dot)kapila16(at)gmail(dot)com>, Andres Freund <andres(at)2ndquadrant(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WAL format and API changes (9.5)
Date: 2014-11-14 08:31:08
Message-ID: CAB7nPqR-TUdK+-Uc5-Aodjpdxw3=-hjeV7y-1nKj+qo+O1WQ+A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Nov 13, 2014 at 10:33 PM, Heikki Linnakangas
<hlinnakangas(at)vmware(dot)com> wrote:
> In quick testing, this new WAL format is somewhat more compact than the 9.4
> format. That also seems to have more than bought back the performance
> regression I saw earlier. Here are results from my laptop, using the
> wal-update-testsuite.sh script:
> master:
> [results]
> (27 rows)
> wal-format-and-api-changes-9.patch:
> [results]
> (27 rows)
So based on your series of tests, you are saving 6% up to 10%.
That's... Really cool!

> Aside from the WAL record format changes, this patch adds the "decoded WAL
> record" infrastructure that we talked about with Andres. XLogReader now has
> a new function, DecodeXLogRecord, which parses the block headers etc. from
> the WAL record, and copies the data chunks to aligned buffers. The redo
> routines are passed a pointer to the XLogReaderState, instead of the plain
> XLogRecord, and the redo routines can use macros and functions defined
> xlogreader.h to access the already-decoded WAL record. The new WAL record
> format is difficult to parse in a piece-meal fashion, so it really needs
> this separate decoding pass to be efficient.
>
> Thoughts on this new WAL record format? I've attached the xlogrecord.h file
> here separately for easy reading, if you want to take a quick look at just
> that without applying the whole patch.
The new format is neat, and that's a lot of code.. Grouping together
the blocks of data is a good thing for the FPW compression patch as
well.

Note that this patch conflicts with the recent commits 81c4508 and
34402ae. installcheck-world is passing, and standbys are able to
replay records, at least without crashes.

Here are some more comments:
1) with assertions enabled this does not compile because of a small
typo in xlogreader.h here:
+#define XLogRecHasAnyBlockRefs(decoder) ((decoder)->max_block id >= 0)
2) In xlogreader.c, XLogRecGetBlockData should return char * but a
boolean is returned here:
+XLogRecGetBlockData(XLogReaderState *record, uint8 block_id, Size *len)
+{
+ DecodedBkpBlock *bkpb;
+
+ if (!record->blocks[block_id].in_use)
+ return false;
As no blocks are in use in this case this should be NULL.
3) pg_xlogdump does not seem to work:
$ pg_xlogdump 00000001000000000000000D
pg_xlogdump: FATAL: could not find a valid record after 0/D000000
4) A couple of NOT_USED blocks could be removed, no?
+#ifdef NOT_USED
BlockNumber leftChildBlkno = InvalidBlockNumber;
+#endif
5) Here why not using the 2nd block instead of the 3rd (@_bt_getroot)?
+ XLogBeginInsert();
+ XLogRegisterBuffer(0, rootbuf, REGBUF_WILL_INIT);
+ XLogRegisterBuffer(2, metabuf, REGBUF_WILL_INIT);
6) There are FIXME blocks:
+ // FIXME
+ //if ((bkpb->fork_flags & BKPBLOCK_WILL_INIT) != 0 &&
mode != RBM_ZERO)
+ // elog(PANIC, "block with WILL_INIT flag in WAL
record must be zeroed by redo routine");]
And that:
+ /* FIXME: translation? Although this shouldn't happen.. */
+ ereport(ERROR,
+ (errmsg("error decoding WAL record"),
+ errdetail("%s", errormsg)));
Regards,
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2014-11-14 08:39:33 Re: alter user/role CURRENT_USER
Previous Message Heikki Linnakangas 2014-11-14 08:17:24 Re: Change in HEAP_NEWPAGE logging makes diagnosis harder