Re: [HACKERS] Full page writes improvement, code update

From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>, "Hannu Krosing" <hannu(at)skype(dot)net>
Cc: "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-12 10:59:17
Message-ID: E1539E0ED7043848906A8FF995BDA57901E7BD64@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


> I don't fully understand what "transaction log" means. If it means
> "archived WAL", the current (8.2) code handle WAL as follows:

Probably we can define "transaction log" to be the part of WAL that is
not
full pages.

> 1) If full_page_writes=off, then no full page writes will be
> written to WAL, except for those during onlie backup (between
> pg_start_backup and
> pg_stop_backup). The WAL size will be considerably small
> but it cannot
> recover from partial/inconsistent write to the database
> files. We have to go back to the online backup and apply all
> the archive log.
>
> 2) If full_page_writes=on, then full page writes will be
> written at the first update of a page after each checkpoint,
> plus full page writes at
> 1). Because we have no means (in 8.2) to optimize the WAL
> so far, what
> we can do is to copy WAL or gzip it at archive time.
>
> If we'd like to keep good chance of recovery after the crash,
> 8.2 provides only the method 2), leaving archive log size
> considerably large. My proposal maintains the chance of
> crash recovery the same as in the case of full_page_writes=on
> and reduces the size of archived log as in the case of
> full_page_writes=off.

Yup, this is a good summary.

You say you need to remove the optimization that avoids
the logging of a new tuple because the full page image exists.
I think we must already have the info in WAL which tuple inside the full
page image
is new (the one for which we avoided the WAL entry for).

How about this:
Leave current WAL as it is and only add the not removeable flag to full
pages.
pg_compresslog then replaces the full page image with a record for the
one tuple that is changed.
I tend to think it is not worth the increased complexity only to save
bytes in the uncompressed WAL though.

Another point about pg_decompresslog:

Why do you need a pg_decompresslog ? Imho pg_compresslog should already
do the replacing of the
full_page with the dummy entry. Then pg_decompresslog could be a simple
gunzip, or whatever compression was used,
but no logic.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Hiroshi Saito 2007-04-12 11:46:09 Re: Vista/IPv6
Previous Message Heikki Linnakangas 2007-04-12 10:57:17 Re: Automatic adjustment of bgwriter_lru_maxpages

Browse pgsql-patches by date

  From Date Subject
Next Message Marko Kreen 2007-04-12 11:17:20 Re: RESET SESSION v3
Previous Message Heikki Linnakangas 2007-04-12 10:57:17 Re: Automatic adjustment of bgwriter_lru_maxpages