Re: [HACKERS] Full page writes improvement, code update

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>
Cc: "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>, "Hannu Krosing" <hannu(at)skype(dot)net>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, "Simon Riggs" <simon(at)2ndquadrant(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-13 14:36:44
Message-ID: 19002.1176475004@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

"Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at> writes:
> But you also turn off the optimization that avoids writing regular
> WAL records when the info is already contained in a full-page image
> (increasing the uncompressed size of WAL).
> It was that part I questioned.

That's what bothers me about this patch, too. It will be increasing
the cost of writing WAL (more data -> more CRC computation and more
I/O, not to mention more contention for the WAL locks) which translates
directly to a server slowdown.

The main arguments that I could see against Andreas' alternative are:

1. Some WAL record types are arranged in a way that actually would not
permit the reconstruction of the short form from the long form, because
they throw away too much data when the full-page image is substituted.
An example that's fresh in my mind is that the current format of the
btree page split WAL record discards newitemoff in that case, so you
couldn't identify the inserted item in the page image. Now this is only
saving two bytes in what's usually going to be a darn large record
anyway, and it complicates the code to do it, so I wouldn't cry if we
changed btree split to include newitemoff always. But there might be
some other cases where more data is involved. In any case, someone
would have to look through every single WAL record type to determine
whether reconstruction is possible and fix it if not.

2. The compresslog utility would have to have specific knowledge about
every compressible WAL record type, to know how to convert it to the
short format. That means an ongoing maintenance commitment there.
I don't think this is unacceptable, simply because we need only teach
it about a few common record types, not everything under the sun ---
anything it doesn't know how to fix, just leave alone, and if it's an
uncommon record type it really doesn't matter. (I guess that means
that we don't really have to do #1 for every last record type, either.)

So I don't think either of these is a showstopper. Doing it this way
would certainly make the patch more acceptable, since the argument that
it might hurt rather than help performance in some cases would go away.

> What about disconnecting WAL LSN from physical WAL record position
> during replay ?
> Add simple short WAL records in pg_compresslog like: advance LSN by 8192
> bytes.

I don't care for that, as it pretty much destroys some of the more
important sanity checks that xlog replay does. The page boundaries
need to match the records contained in them. So I think we do need
to have pg_decompresslog insert dummy WAL entries to fill up the
space saved by omitting full pages.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2007-04-13 15:09:04 Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee
Previous Message Simon Riggs 2007-04-13 14:33:18 Re: Group Commit

Browse pgsql-patches by date

  From Date Subject
Next Message Bruce Momjian 2007-04-13 15:09:04 Re: Reviewers Guide to DeferredTransactions/TransactionGuarantee
Previous Message Zeugswetter Andreas ADI SD 2007-04-13 13:40:31 Re: [PATCHES] Reviewers Guide to Deferred Transactions/Transaction Guarantee