Re: Improving compressibility of WAL files

From: Greg Smith <gsmith(at)gregsmith(dot)com>
To: Hannu Krosing <hannu(at)krosing(dot)net>
Cc: Aidan Van Dyk <aidan(at)highrise(dot)ca>, Bruce Momjian <bruce(at)momjian(dot)us>, Kyle Cordes <kyle(at)kylecordes(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgreSQL(dot)org>
Subject: Re: Improving compressibility of WAL files
Date: 2009-01-09 00:12:42
Message-ID: Pine.GSO.4.64.0901081850420.2578@westnet.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On Fri, 9 Jan 2009, Hannu Krosing wrote:

> won't it still be easier/less intrusive on inline core functionality and
> more flexible to just record end-of-valid-wal somewhere and then let the
> compressor discard the invalid part when compressing and recreate it
> with zeros on decompression ?

I thought at one point that the direction this was going toward was to
provide the size of the WAL file as a parameter you can use in the
archive_command: %p provides the path, %f the file name, and now %l the
length. That makes an example archive command something like:

head -c "%l" "%p" | gzip > /mnt/server/archivedir/"%f"

Expanding it back to always be 16MB on the other side might require some
trivial script, can't think of a standard UNIX tool suitable for that but
it's easy enough to write. I'm assuming I just remembering someone else's
suggestion here, maybe I just invented the above. You don't want to just
modify pg_standby to accept small files, because then you've made it
harder to make absolutely sure when the file is ready to be processed if a
non-atomic copy is being done. And it may make sense to provide some
simple C implementations of the clear/expand tools in contrib even with
the %l addition, mainly to help out Windows users.

To reiterate the choices I remember popping up in the multiple rounds this
has come up, possible implementations that would work for this general
requirement include:

1) Provide the length as part of the archive command
2) Add a more explicit end-of-WAL delimiter
3) Write zeros to the unused portion in the server
4) pglesslog
5) pg_clearxlogtail

With "(6) use sync rep" being not quite a perfect answer; there are
certainly WAN-based use cases where you don't want full sync rep but do
want the WAL to compress as much as possible.

I think (1) is a better solution than most of these in the context of an
improvement to core, with (4) pglesslog being the main other contender
because of how it provides additional full-page write improvements.

--
* Greg Smith gsmith(at)gregsmith(dot)com http://www.gregsmith.com Baltimore, MD

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Steve Henry 2009-01-09 00:27:54 Where do I find crypto installer for Mac platform
Previous Message Hannu Krosing 2009-01-08 23:41:22 Re: Improving compressibility of WAL files

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2009-01-09 00:30:16 Re: Buffer pool statistics in Explain Analyze
Previous Message KaiGai Kohei 2009-01-09 00:12:01 Re: New patch for Column-level privileges