This page in other versions: 9.0 / 9.1 / 9.2 / 9.3  |  Development versions: devel / 9.4  |  Unsupported versions: 7.4 / 8.0 / 8.1 / 8.2 / 8.3 / 8.4

26.4. WAL Internals

WAL is automatically enabled; no action is required from the administrator except ensuring that the disk-space requirements for the WAL logs are met, and that any necessary tuning is done (see Section 26.3).

WAL logs are stored in the directory pg_xlog under the data directory, as a set of segment files, normally each 16 MB in size. Each segment is divided into pages, normally 8 KB each. The log record headers are described in access/xlog.h; the record content is dependent on the type of event that is being logged. Segment files are given ever-increasing numbers as names, starting at 000000010000000000000000. The numbers do not wrap, at present, but it should take a very very long time to exhaust the available stock of numbers.

It is of advantage if the log is located on another disk than the main database files. This may be achieved by moving the directory pg_xlog to another location (while the server is shut down, of course) and creating a symbolic link from the original location in the main data directory to the new location.

The aim of WAL, to ensure that the log is written before database records are altered, may be subverted by disk drives that falsely report a successful write to the kernel, when in fact they have only cached the data and not yet stored it on the disk. A power failure in such a situation may still lead to irrecoverable data corruption. Administrators should try to ensure that disks holding PostgreSQL's WAL log files do not make such false reports.

After a checkpoint has been made and the log flushed, the checkpoint's position is saved in the file pg_control. Therefore, when recovery is to be done, the server first reads pg_control and then the checkpoint record; then it performs the REDO operation by scanning forward from the log position indicated in the checkpoint record. Because the entire content of data pages is saved in the log on the first page modification after a checkpoint, all pages changed since the checkpoint will be restored to a consistent state.

To deal with the case where pg_control is corrupted, we should support the possibility of scanning existing log segments in reverse order — newest to oldest — in order to find the latest checkpoint. This has not been implemented yet. pg_control is small enough (less than one disk page) that it is not subject to partial-write problems, and as of this writing there have been no reports of database failures due solely to inability to read pg_control itself. So while it is theoretically a weak spot, pg_control does not seem to be a problem in practice.

Comments


May 12, 2006, 1:17 a.m.

To be reliable, without RAID etc. one would must write the WAL to more than one disk. Ideally, one copy to an ssh pipe to a completely different machine.

It would be good to mention if this is possible.

It would also be good to provide an overview of how to install a basic, medium reliablilty system that never looses commited data at any hardware failure point. (Or reference any such description)

(A cute trick is to write the WAL to a different machine, which as a slave database that reads it back and "recovers" a copy of the master database. Postgresql must already contain 95% of the code required to do this. But I assume that it not supported. (If it is please reference from here!))

At the end of the day cute features are nothing if we cannot be certain to be able to read back the data!


Oct. 10, 2006, 5:55 p.m.

Just as a reply to Anthony-- I would think that it would be possible to write to as many locations as one might wish via the use of named pipes and commands such as tee. However, I worry about the ability to maintain guaranteed data in a situation of such complexity. I.e. the xlog segments might be incomplete in the event of network errors.

BTW, I think that Slony-I supports log shipping as you mentioned. :-)

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group