Re: [HACKERS] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Hannu Krosing <hannu(at)skype(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-20 09:13:52
Message-ID: 46288450.1050703@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

I agree that pg_compresslog should be aware of all the WAL records'
details so that it can optimize archive log safely. In my patch, I've
examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe.

Also I agree further pg_compresslog maintenance needs to examine changes
in WAL record format. Because the number of such format will be
limited, I think the amount of work will be reasonable enough.

Regards;

Simon Riggs wrote:
> On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:
>> "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:
>>> But you also turn off the optimization that avoids writing regular
>>> WAL records when the info is already contained in a full-page image
>>> (increasing the uncompressed size of WAL).
>>> It was that part I questioned.
>
> I think its right to question it, certainly.
>
>> That's what bothers me about this patch, too. It will be increasing
>> the cost of writing WAL (more data -> more CRC computation and more
>> I/O, not to mention more contention for the WAL locks) which translates
>> directly to a server slowdown.
>
> I don't really understand this concern. Koichi-san has included a
> parameter setting that would prevent any change at all in the way WAL is
> written. If you don't want this slight increase in WAL, don't enable it.
> If you do enable it, you'll also presumably be compressing the xlog too,
> which works much better than gzip using less CPU. So overall it saves
> more than it costs, ISTM, and nothing at all if you choose not to use
> it.
>
>> The main arguments that I could see against Andreas' alternative are:
>>
>> 1. Some WAL record types are arranged in a way that actually would not
>> permit the reconstruction of the short form from the long form, because
>> they throw away too much data when the full-page image is substituted.
>> An example that's fresh in my mind is that the current format of the
>> btree page split WAL record discards newitemoff in that case, so you
>> couldn't identify the inserted item in the page image. Now this is only
>> saving two bytes in what's usually going to be a darn large record
>> anyway, and it complicates the code to do it, so I wouldn't cry if we
>> changed btree split to include newitemoff always. But there might be
>> some other cases where more data is involved. In any case, someone
>> would have to look through every single WAL record type to determine
>> whether reconstruction is possible and fix it if not.
>>
>> 2. The compresslog utility would have to have specific knowledge about
>> every compressible WAL record type, to know how to convert it to the
>> short format. That means an ongoing maintenance commitment there.
>> I don't think this is unacceptable, simply because we need only teach
>> it about a few common record types, not everything under the sun ---
>> anything it doesn't know how to fix, just leave alone, and if it's an
>> uncommon record type it really doesn't matter. (I guess that means
>> that we don't really have to do #1 for every last record type, either.)
>>
>> So I don't think either of these is a showstopper. Doing it this way
>> would certainly make the patch more acceptable, since the argument that
>> it might hurt rather than help performance in some cases would go away.
>
> Yeh, its additional code paths, but it sounds like Koichi-san and
> colleagues are going to be trail blazing any bugs there and will be
> around to fix any more that emerge.
>
>>> What about disconnecting WAL LSN from physical WAL record position
>>> during replay ?
>>> Add simple short WAL records in pg_compresslog like: advance LSN by 8192
>>> bytes.
>> I don't care for that, as it pretty much destroys some of the more
>> important sanity checks that xlog replay does. The page boundaries
>> need to match the records contained in them. So I think we do need
>> to have pg_decompresslog insert dummy WAL entries to fill up the
>> space saved by omitting full pages.
>
> Agreed. I don't want to start touching something that works so well.
>
>
> We've been thinking about doing this for at least 3 years now, so I
> don't see any reason to baulk at it now. I'm happy with Koichi-san's
> patch as-is, assuming further extensive testing will be carried out on
> it during beta.
>

--
-------------
Koichi Suzuki

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Koichi Suzuki 2007-04-20 09:19:40 Re: [HACKERS] Full page writes improvement, code update
Previous Message Gregory Stark 2007-04-20 08:56:47 Re: Improving deadlock error messages

Browse pgsql-patches by date

  From Date Subject
Next Message Koichi Suzuki 2007-04-20 09:19:40 Re: [HACKERS] Full page writes improvement, code update
Previous Message Zoltan Boszormenyi 2007-04-20 08:37:23 Re: parser dilemma