Re: WAL format and API changes (9.5)

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: Heikki Linnakangas <hlinnakangas(at)vmware(dot)com>
Cc: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Subject: Re: WAL format and API changes (9.5)
Date: 2014-11-05 22:38:45
Message-ID: 20141105223845.GA28295@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2014-11-05 23:08:31 +0200, Heikki Linnakangas wrote:
> On 10/30/2014 09:19 PM, Andres Freund wrote:
> >Some things I noticed while reading the patch:
>
> A lot of good comments, but let me pick up just two that are related:
>
> >* There's a couple record types (e.g. XLOG_SMGR_TRUNCATE) that only
> > refer to the relation, but not to the block number. These still log
> > their rnode manually. Shouldn't we somehow deal with those in a
> > similar way explicit block references are dealt with?
> >
> >* Hm. At least WriteMZeroPageXlogRec (and probably the same for all the
> > other slru stuff) doesn't include a reference to the page. Isn't that
> > bad? Shouldn't we make XLogRegisterBlock() usable for that case?
> > Otherwise I fail to see how pg_rewind like tools can sanely deal with this?
>
> Yeah, there are still operations that modify relation pages, but don't store
> the information about the modified pages in the standard format. That
> includes XLOG_SMGR_TRUNCATE that you spotted, and XLOG_SMGR_CREATE, and also
> XLOG_DBASE_CREATE/DROP. And then there are updates to non-relation files,
> like all the slru stuff, relcache init files, etc. And updates to the FSM
> and VM bypass the full-page write mechanism too.

That's a awful number of special cases. I see little reason not to
invent something that can reference at least the relation in those
cases. Then we can at least only resync the relations if there were
changes. E.g. for vm's that not necessarily all that likely in an insert
mostly case.
I'm not that worried about things like create/drop database, but fsm,
vm, and the various slru's are really somewhat essential.

> To play it safe, pg_rewind copies all non-relation files as is. That
> includes all SLRUs, FSM and VM files, and everything else whose filename
> doesn't match the (main fork of) a relation file. Of course, that's a fair
> amount of copying to do, so we might want to optimize that in the future,
> but I want to nail the relation files first. They are usually an order of
> magnitude larger than the other files, after all.

That's fair enough.

> Unfortunately pg_rewind still needs to recognize and parse the special WAL
> records like XLOG_SMGR_CREATE/TRUNCATE, that modify relation files outside
> the normal block registration system. I've been thinking that we should add
> another flag to the WAL record format to mark such records.

So everytime pg_rewind comes across a record with that flag set which it
doesn't have special case code it'd balk? Sounds sensible.

Personally I think we should at least have a generic format to refer to
entire relations without a specific block number. And one to refer to
SLRUs. I don't think we necessarily need to implement them now, but we
should make sure there's bit space left to denote them.

Greetings,

Andres Freund

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Petr Jelinek 2014-11-05 22:43:00 Re: Sequence Access Method WIP
Previous Message philip taylor 2014-11-05 22:36:03 Re: Amazon Redshift