Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>, Hannu Krosing <hannu(at)skype(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-20 09:13:52
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-patches

I agree that pg_compresslog should be aware of all the WAL records' 
details so that it can optimize archive log safely.   In my patch, I've 
examined 8.2's WAL records to make pg_compresslog/pg_decompresslog safe.

Also I agree further pg_compresslog maintenance needs to examine changes 
in WAL record format.   Because the number of such format will be 
limited, I think the amount of work will be reasonable enough.


Simon Riggs wrote:
> On Fri, 2007-04-13 at 10:36 -0400, Tom Lane wrote:
>> "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:
>>> But you also turn off the optimization that avoids writing regular
>>> WAL records when the info is already contained in a full-page image
>>> (increasing the uncompressed size of WAL).
>>> It was that part I questioned.
> I think its right to question it, certainly.
>> That's what bothers me about this patch, too.  It will be increasing
>> the cost of writing WAL (more data -> more CRC computation and more
>> I/O, not to mention more contention for the WAL locks) which translates
>> directly to a server slowdown.
> I don't really understand this concern. Koichi-san has included a
> parameter setting that would prevent any change at all in the way WAL is
> written. If you don't want this slight increase in WAL, don't enable it.
> If you do enable it, you'll also presumably be compressing the xlog too,
> which works much better than gzip using less CPU. So overall it saves
> more than it costs, ISTM, and nothing at all if you choose not to use
> it.
>> The main arguments that I could see against Andreas' alternative are:
>> 1. Some WAL record types are arranged in a way that actually would not
>> permit the reconstruction of the short form from the long form, because
>> they throw away too much data when the full-page image is substituted.
>> An example that's fresh in my mind is that the current format of the
>> btree page split WAL record discards newitemoff in that case, so you
>> couldn't identify the inserted item in the page image.  Now this is only
>> saving two bytes in what's usually going to be a darn large record
>> anyway, and it complicates the code to do it, so I wouldn't cry if we
>> changed btree split to include newitemoff always.  But there might be
>> some other cases where more data is involved.  In any case, someone
>> would have to look through every single WAL record type to determine
>> whether reconstruction is possible and fix it if not.
>> 2. The compresslog utility would have to have specific knowledge about
>> every compressible WAL record type, to know how to convert it to the
>> short format.  That means an ongoing maintenance commitment there.
>> I don't think this is unacceptable, simply because we need only teach
>> it about a few common record types, not everything under the sun ---
>> anything it doesn't know how to fix, just leave alone, and if it's an
>> uncommon record type it really doesn't matter.  (I guess that means
>> that we don't really have to do #1 for every last record type, either.)
>> So I don't think either of these is a showstopper.  Doing it this way
>> would certainly make the patch more acceptable, since the argument that
>> it might hurt rather than help performance in some cases would go away.
> Yeh, its additional code paths, but it sounds like Koichi-san and
> colleagues are going to be trail blazing any bugs there and will be
> around to fix any more that emerge.
>>> What about disconnecting WAL LSN from physical WAL record position
>>> during replay ?
>>> Add simple short WAL records in pg_compresslog like: advance LSN by 8192
>>> bytes.
>> I don't care for that, as it pretty much destroys some of the more
>> important sanity checks that xlog replay does.  The page boundaries
>> need to match the records contained in them.  So I think we do need
>> to have pg_decompresslog insert dummy WAL entries to fill up the
>> space saved by omitting full pages.
> Agreed. I don't want to start touching something that works so well.
> We've been thinking about doing this for at least 3 years now, so I
> don't see any reason to baulk at it now. I'm happy with Koichi-san's
> patch as-is, assuming further extensive testing will be carried out on
> it during beta.

Koichi Suzuki

In response to

pgsql-hackers by date

Next:From: Koichi SuzukiDate: 2007-04-20 09:19:40
Subject: Re: [HACKERS] Full page writes improvement, code update
Previous:From: Gregory StarkDate: 2007-04-20 08:56:47
Subject: Re: Improving deadlock error messages

pgsql-patches by date

Next:From: Koichi SuzukiDate: 2007-04-20 09:19:40
Subject: Re: [HACKERS] Full page writes improvement, code update
Previous:From: Zoltan BoszormenyiDate: 2007-04-20 08:37:23
Subject: Re: parser dilemma

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group