Re: XLog changes for 9.3

From: Andres Freund <andres(at)2ndquadrant(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>
Subject: Re: XLog changes for 9.3
Date: 2012-06-07 14:18:55
Message-ID: 201206071618.55703.andres@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thursday, June 07, 2012 03:50:35 PM Heikki Linnakangas wrote:
> When I worked on the XLogInsert scaling patch, it became apparent that
> some changes to the WAL format would make it a lot easier. So for 9.3,
> I'd like to do some refactoring:

> 1. Use a 64-bit integer instead of the two-variable log/seg
> representation, for identifying a WAL segment. This has no user-visible
> effect, but makes the code a bit simpler.
+1

We can define a sensible InvalidXLogRecPtr instead of doing that locally in
loads of places! Yipee.

> 2. Don't waste the last WAL segment in each logical 4GB file. Currently,
> we skip the WAL segment ending with "FF". The comments claim that
> wasting the last segment "ensures that we don't have problems
> representing last-byte-position-plus-1", but in my experience, it just
> makes things more complicated. You have two ways to represent the
> segment boundary, and some functions are picky on which one is used. For
> example, XLogWrite() assumes that when you want to flush to the end of a
> logical log file, you use the "5/FF000000" representation, not
> "6/00000000". Other functions, like XLogPageRead(), expect the latter.
>
> This is a backwards-incompatible change for external utilities that know
> how the WAL segment numbering works. Hopefully there aren't too many of
> those around.
+1

> 3. Move the only field, xl_rem_len, from the continuation record header
> straight to the xlog page header, eliminating XLogContRecord altogether.
> This makes it easier to calculate in advance how much space a WAL record
> requires, as it no longer depends on how many pages it has to be split
> across. This wastes 4-8 bytes on every xlog page, but that's not much.
+1. I don't think this will waste a measureable amount in real-world
scenarios. A very big percentag of pages have continuation records.

> 4. Allow WAL record header to be split across page boundaries.
> Currently, if there are less than SizeOfXLogRecord bytes left on the
> current WAL page, it is wasted, and the next record is inserted at the
> beginning of the next page. The problem with that is again that it makes
> it impossible to know in advance exactly how much space a WAL record
> requires, because it depends on how many bytes need to be wasted at the
> end of current page.
+0.5. Its somewhat convenient to be able to look at a record before you have
reassembled it over multiple pages. But its probably not worth the
implementation complexity.
If we do that we can remove all the aligment padding as well. Which would be a
problem for you anyway, wouldn't it?

> These changes will help the XLogInsert scaling patch, by making the
> space calculations simpler. In essence, to reserve space for a WAL
> record of size X, you just need to do "bytepos += X". There's a lot
> more details with that, like mapping from the contiguous byte position
> to an XLogRecPtr that takes page headers into account, and noticing
> RedoRecPtr changes safely, but it's a start.
Hm. Wouldn't you need to remove short/long page headers for that as well?

Andres

--
Andres Freund http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2012-06-07 14:27:32 Re: "page is not marked all-visible" warning in regression tests
Previous Message Tom Lane 2012-06-07 14:13:50 Re: Could we replace SysV semaphores with latches?