Re: WAL format and API changes (9.5)

From: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WAL format and API changes (9.5)
Date: 2014-11-05 21:08:31
Message-ID: 545A91CF.7030000@vmware.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/30/2014 09:19 PM, Andres Freund wrote:
> Some things I noticed while reading the patch:

A lot of good comments, but let me pick up just two that are related:

> * There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only
> refer to the relation, but not to the block number. These still log
> their rnode manually. Shouldn't we somehow deal with those in a
> similar way explicit block references are dealt with?
>
> * Hm. At least WriteMZeroPageXlogRec (and probably the same for all the
> other slru stuff) doesn't include a reference to the page. Isn't that
> bad? Shouldn't we make XLogRegisterBlock() usable for that case?
> Otherwise I fail to see how pg_rewind like tools can sanely deal with this?

Yeah, there are still operations that modify relation pages, but don't
store the information about the modified pages in the standard format.
That includes XLOG_SMGR_TRUNCATE that you spotted, and XLOG_SMGR_CREATE,
and also XLOG_DBASE_CREATE/DROP. And then there are updates to
non-relation files, like all the slru stuff, relcache init files, etc.
And updates to the FSM and VM bypass the full-page write mechanism too.

To play it safe, pg_rewind copies all non-relation files as is. That
includes all SLRUs, FSM and VM files, and everything else whose filename
doesn't match the (main fork of) a relation file. Of course, that's a
fair amount of copying to do, so we might want to optimize that in the
future, but I want to nail the relation files first. They are usually an
order of magnitude larger than the other files, after all.

Unfortunately pg_rewind still needs to recognize and parse the special
WAL records like XLOG_SMGR_CREATE/TRUNCATE, that modify relation files
outside the normal block registration system. I've been thinking that we
should add another flag to the WAL record format to mark such records.
pg_rewind will still need to understand the record format of such
records, but the flag will help to catch bugs of omission. If pg_rewind
or another such tool sees a record that's flagged as "special", but
doesn't recognize the record type, it can throw an error.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Geoghegan 2014-11-05 21:09:51 Re: INSERT ... ON CONFLICT {UPDATE | IGNORE}
Previous Message Alvaro Herrera 2014-11-05 20:54:19 Re: BRIN indexes - TRAP: BadArgument