Re: [HACKERS] Full page writes improvement, code update

From: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
To: "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
Cc: "Hannu Krosing" <hannu(at)skype(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, <josh(at)agliodbs(dot)com>, <pgsql-hackers(at)postgresql(dot)org>, <pgsql-patches(at)postgresql(dot)org>
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-13 09:21:34
Message-ID: E1539E0ED7043848906A8FF995BDA57901E7BEBE@m0143.s-mxs.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches


> > Yup, this is a good summary.
> >
> > You say you need to remove the optimization that avoids the logging
of
> > a new tuple because the full page image exists.
> > I think we must already have the info in WAL which tuple inside the
> > full page image is new (the one for which we avoided the WAL entry
> > for).
> >
> > How about this:
> > Leave current WAL as it is and only add the not removeable flag to
> > full pages.
> > pg_compresslog then replaces the full page image with a record for
the
> > one tuple that is changed.
> > I tend to think it is not worth the increased complexity only to
save
> > bytes in the uncompressed WAL though.
>
> It is essentially what my patch proposes. My patch includes
> flag to full page writes which "can be" removed.

Ok, a flag that marks full page images that can be removed is perfect.

But you also turn off the optimization that avoids writing regular
WAL records when the info is already contained in a full-page image
(increasing the
uncompressed size of WAL).
It was that part I questioned. As already stated, maybe I should not
have because
it would be too complex to reconstruct a regular WAL record from the
full-page image.
But that code would also be needed for WAL based partial replication, so
if it where too
complicated we would eventually want a switch to turn off the
optimization anyway
(at least for heap page changes).

> > Another point about pg_decompresslog:
> >
> > Why do you need a pg_decompresslog ? Imho pg_compresslog should
> > already do the replacing of the full_page with the dummy entry. Then

> > pg_decompresslog could be a simple gunzip, or whatever compression
was
> > used, but no logic.
>
> Just removing full page writes does not work. If we shift the rest
of
> the WAL, then LSN becomes inconsistent in compressed archive logs
which
> pg_compresslog produces. For recovery, we have to restore LSN as the

> original WAL. Pg_decompresslog restores removed full page writes as
a
> dumm records so that recovery redo functions won't be confused.

Ah sorry, I needed some pgsql/src/backend/access/transam/README reading.

LSN is the physical position of records in WAL. Thus your dummy record
size is equal to what you cut out of the original record.
What about disconnecting WAL LSN from physical WAL record position
during replay ?
Add simple short WAL records in pg_compresslog like: advance LSN by 8192
bytes.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Magnus Hagander 2007-04-13 10:27:33 Re: [pgsql-patches] O_DIRECT support for Windows
Previous Message Magnus Hagander 2007-04-13 08:24:52 Re: Vista/IPv6

Browse pgsql-patches by date

  From Date Subject
Next Message Magnus Hagander 2007-04-13 10:27:33 Re: [pgsql-patches] O_DIRECT support for Windows
Previous Message Pavan Deolasee 2007-04-13 06:56:29 Re: Dead Space Map version 3 (simplified)