PITR: XLog File compression on Archive

From: "Simon Riggs" <simon(at)2ndquadrant(dot)com>
To: "Bruce Momjian" <pgman(at)candle(dot)pha(dot)pa(dot)us>, "PostgreSQL-development" <pgsql-hackers(at)postgreSQL(dot)org>
Subject: PITR: XLog File compression on Archive
Date: 2004-08-23 21:03:26
Message-ID: NOEFLCFHBPDAFHEIPGBOKECFCDAA.simon@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


One of the possible barriers to adoption of PITR is the volume of the logs
themselves. Maybe this isn't a problem for now, maybe it is.

Re-thinking the whole purpose of the additional full page images appended to
the xlog records, I now understand and agree with Tom's comment in the docs
that we don't need to include those additional full page images for PITR -
they only exist to correctly recover the database in the event of a crash.
The reason for this is that the base backup provides a full set of blocks on
which to recover - there is no danger that the backup contains bad blocks,
as would be the case for crash recovery. It's taken an age for me to
understand that bit, since the actual crash recovery model seems different
from other systems (I should have spotted this earlier).

As a result, I have thought that there may be a way to remove those pages
from the xlog files immediately before being copied away to archive, without
effecting crash recovery logic AT ALL. The archiver process could read the
xlog files and re-write them exactly as read to another file, but without
the full page images - writing exactly the current xlog record format. This
would mean that the archived xlog files would then become variable length.
Apart from that, not much other code need change. The recovery logic
wouldn't need to change at all - the xlog files would just simply never have
full page images to re-apply. The archive logic would need enhancing to do
the read/re-write, but much of that same code needs to be written/adapted
anyway for the offline xlog file reader. The archive code itself would
simply copy to an intermediate file, say ARCHIVEFILE, just like we do on
recovery - so the use of %p would still work as before and require
redirecting only, no other changes.

Anyway, the effect of that would be to allow compression of ARCHIVED xlog
files, without effecting crash recovery logic AT ALL - so the full page
images would still exist within locally held xlog files. If we ever mix the
two on recovery, it all still works AFAICS. So, the problem of how we stop
saving away full page images goes away - we still Save them, but we don't
Archive them.

I raise this now because I'm thinking that this functionality really ought
to be in the main production 8.0 release. Not sure if anybody will
agree....but that's what I'm thinking now, based upon what seems like a
simple design to put it there. My rationale is that it will be simpler to
support one file format than two, if we introduce the change at a later
time.

...I know, I write it, then we decide.....OK.....

Best Regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Simon Riggs 2004-08-23 21:03:37 Re: 8.0 Open Items
Previous Message Manfred Spraul 2004-08-23 20:19:20 Re: fsync and hardware write cache