Skip site navigation (1) Skip section navigation (2)

XLog changes for 9.3

From: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
To: PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: XLog changes for 9.3
Date: 2012-06-07 13:50:35
Message-ID: 4FD0B1AB.3090405@enterprisedb.com (view raw or flat)
Thread:
Lists: pgsql-hackers
When I worked on the XLogInsert scaling patch, it became apparent that 
some changes to the WAL format would make it a lot easier. So for 9.3, 
I'd like to do some refactoring:

1. Use a 64-bit integer instead of the two-variable log/seg 
representation, for identifying a WAL segment. This has no user-visible 
effect, but makes the code a bit simpler.

2. Don't waste the last WAL segment in each logical 4GB file. Currently, 
we skip the WAL segment ending with "FF". The comments claim that 
wasting the last segment "ensures that we don't have problems 
representing last-byte-position-plus-1", but in my experience, it just 
makes things more complicated. You have two ways to represent the 
segment boundary, and some functions are picky on which one is used. For 
example, XLogWrite() assumes that when you want to flush to the end of a 
logical log file, you use the "5/FF000000" representation, not 
"6/00000000". Other functions, like XLogPageRead(), expect the latter.

This is a backwards-incompatible change for external utilities that know 
how the WAL segment numbering works. Hopefully there aren't too many of 
those around.

3. Move the only field, xl_rem_len, from the continuation record header 
straight to the xlog page header, eliminating XLogContRecord altogether. 
This makes it easier to calculate in advance how much space a WAL record 
requires, as it no longer depends on how many pages it has to be split 
across. This wastes 4-8 bytes on every xlog page, but that's not much.

4. Allow WAL record header to be split across page boundaries. 
Currently, if there are less than SizeOfXLogRecord bytes left on the 
current WAL page, it is wasted, and the next record is inserted at the 
beginning of the next page. The problem with that is again that it makes 
it impossible to know in advance exactly how much space a WAL record 
requires, because it depends on how many bytes need to be wasted at the 
end of current page.

These changes will help the XLogInsert scaling patch, by making the 
space calculations simpler. In essence, to reserve space for a WAL 
record of size X, you just need to do "bytepos += X".  There's a lot 
more details with that, like mapping from the contiguous byte position 
to an XLogRecPtr that takes page headers into account, and noticing 
RedoRecPtr changes safely, but it's a start.

-- 
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

Responses

pgsql-hackers by date

Next:From: Tom LaneDate: 2012-06-07 13:56:52
Subject: Re: slow dropping of tables, DropRelFileNodeBuffers, tas
Previous:From: Andres FreundDate: 2012-06-07 13:41:51
Subject: Re: "page is not marked all-visible" warning in regression tests

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group