Skip site navigation (1) Skip section navigation (2)

Re: [HACKERS] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Zeugswetter Andreas ADI SD <ZeugswetterA(at)spardat(dot)at>
Cc: Hannu Krosing <hannu(at)skype(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-20 06:00:15
Message-ID: (view raw, whole thread or download thread mbox)
Lists: pgsql-hackerspgsql-patches
Sorry I was very late to find this.

With DBT-2 benchmark, I've already compared the amount of WAL.   The 
result was as follows:

Amount of WAL after 60min. run of DBT-2 benchmark
wal_add_optimization_info = off (default) 3.13GB
wal_add_optimization_info = on (new case) 3.17GB -> can be optimized to 
0.31GB by pg_compresslog.

So the difference will be around a couple of percents.   I think this is 
very good figure.

For information,
DB Size: 12.35GB (120WH)
Checkpoint timeout: 60min.  Checkpoint occured only once in the run.


I don't think replacing LSN works fine.  For full recovery to the 
current time, we need both archive log and WAL.  Replacing LSN will make 
archive log LSN inconsistent with WAL's LSN and the recovery will not work.

Reconstruction to regular WAL is proposed as pg_decompresslog.  We 
should be careful enough not to make redo routines confused with the 
dummy full page writes, as Simon suggested.  So far, it works fine.


Zeugswetter Andreas ADI SD wrote:
>>> Yup, this is a good summary.
>>> You say you need to remove the optimization that avoids the logging
> of 
>>> a new tuple because the full page image exists.
>>> I think we must already have the info in WAL which tuple inside the 
>>> full page image is new (the one for which we avoided the WAL entry 
>>> for).
>>> How about this:
>>> Leave current WAL as it is and only add the not removeable flag to 
>>> full pages.
>>> pg_compresslog then replaces the full page image with a record for
> the 
>>> one tuple that is changed.
>>> I tend to think it is not worth the increased complexity only to
> save 
>>> bytes in the uncompressed WAL though.
>> It is essentially what my patch proposes.  My patch includes 
>> flag to full page writes which "can be" removed.
> Ok, a flag that marks full page images that can be removed is perfect.
> But you also turn off the optimization that avoids writing regular
> WAL records when the info is already contained in a full-page image
> (increasing the
> uncompressed size of WAL).
> It was that part I questioned. As already stated, maybe I should not
> have because
> it would be too complex to reconstruct a regular WAL record from the
> full-page image.  
> But that code would also be needed for WAL based partial replication, so
> if it where too
> complicated we would eventually want a switch to turn off the
> optimization anyway
> (at least for heap page changes).
>>> Another point about pg_decompresslog:
>>> Why do you need a pg_decompresslog ? Imho pg_compresslog should 
>>> already do the replacing of the full_page with the dummy entry. Then
>>> pg_decompresslog could be a simple gunzip, or whatever compression
> was 
>>> used, but no logic.
>> Just removing full page writes does not work.   If we shift the rest
> of 
>> the WAL, then LSN becomes inconsistent in compressed archive logs
> which 
>> pg_compresslog produces.   For recovery, we have to restore LSN as the
>> original WAL.   Pg_decompresslog restores removed full page writes as
> a 
>> dumm records so that recovery redo functions won't be confused.
> Ah sorry, I needed some pgsql/src/backend/access/transam/README reading.
> LSN is the physical position of records in WAL. Thus your dummy record
> size is equal to what you cut out of the original record.
> What about disconnecting WAL LSN from physical WAL record position
> during replay ?
> Add simple short WAL records in pg_compresslog like: advance LSN by 8192
> bytes.
> Andreas

Koichi Suzuki

In response to


pgsql-hackers by date

Next:From: Neil ConwayDate: 2007-04-20 06:35:49
Subject: Improving deadlock error messages
Previous:From: Pavel StehuleDate: 2007-04-20 05:48:20
Subject: Re: pgsql crollable cursor doesn't support one form of postgresql's cu

pgsql-patches by date

Next:From: Zoltan BoszormenyiDate: 2007-04-20 07:27:46
Subject: Re: parser dilemma
Previous:From: Martijn van OosterhoutDate: 2007-04-19 21:04:49
Subject: Re: parser dilemma

Privacy Policy | About PostgreSQL
Copyright © 1996-2017 The PostgreSQL Global Development Group