Re: [HACKERS] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Josh Berkus <josh(at)agliodbs(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-24 01:15:15
Message-ID: 462D5A23.9060706@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

Sorry, because of so many comments/questions, I'll write inline....

Josh Berkus wrote:
> Hackers,
>
>> Writing lots of additional code simply to remove a parameter that
>> *might* be mis-interpreted doesn't sound useful to me, especially when
>> bugs may leak in that way. My take is that this is simple and useful
>> *and* we have it now; other ways don't yet exist, nor will they in time
>> for 8.3.
>
> How about naming the parameter wal_compressable? That would indicate pretty
> clearly that the parameter is intended to be used with wal_compress and
> nothing else.

Hmm, it sounds nicer.

>
> However, I do agree with Andreas that anything which adds to WAL volume, even
> 3%, seems like going in the wrong direction. We already have higher log
> output than any comparable database (higher than InnoDB by 3x) and we should
> be looking for output to trim as well as compression.
>
> So the relevant question is whether the patch in its current form provides
> enough benefit to make it worthwhile for 8.3, or whether we should wait for
> 8.4. Questions:
>

Before answering questions below, I'd like to say that archive log
optimization has to be address different point of views to the current
(upto 8.2) settings.

1) To deal with partial/inconsisitent write to the data file at crash
recovery, we need full page writes at the first modification to pages
after each checkpoint. It consumes much of WAL space.

2) 1) is not necessary for archive recovery (PITR) and full page writes
can be removed for this purpose. However, we need full page writes
during hot backup to deal with partial writes by backup commands. This
is implemented in 8.2.

3) To maintain crash recovery chance and reduce the amount of archive
log, removal of unnecessary full page writes from archive logs is a
good choice. To do this, we need both logical log and full page writes
in WAL.

I don't think there should be only one setting. It depend on how
database is operated. Leaving wal_add_optiomization_info = off default
does not bring any change in WAL and archive log handling. I
understand some people may not be happy with additional 3% or so
increase in WAL size, especially people who dosn't need archive log at
all. So I prefer to leave the default off.

For users, I think this is simple enough:

1) For people happy with 8.2 settings:
No change is needed to move to 8.3 and there's really no change.

2) For people who need to reduce archive log size but like to leave full
page writes to WAL (to maintain crash recovery chance):
a) Add GUC parameter: wal_add_optiomization_info=on
b) Change archive command from "cp" to "pg_compresslog"
c) Change restore command from "cp" to "pg_decompresslog"

Archive log can be stored and restored as done in older releases.

> 1) is there any throughput benefit for platforms with fast CPU but contrained
> I/O (e.g. 2-drive webservers)? Any penalty for servers with plentiful I/O?

I've only run benchmarks with archive process running, because
wal_add_optimization_info=on does not make sense if we don't archive
WAL. In this situation, total I/O decreases because writes to archive
log decreases. Because of 3% or so increase in WAL size, there will be
increase in WAL write, but decrease in archive writes makes it up.

>
> 2) Will this patch make attempts to reduce WAL volume in the future
> significantly harder?

Yes, I'd like to continue to work to reduce the WAL size. It's still
an issue when database size becomes several handreds of gigabytes in
size. Anyway, I think WAL size reduction has to be done in
XLogInsert() or XLogWrite(). We need much more discussion for this.
The issue will be how to maintain crash recovery chance by inconsistent
writes (by full_page_writes=off, we have to give it up). On the other
hand we have to keep examining each WAL record.

>
> 3) How is this better than command-line compression for log-shipping? e.g.
> why do we need it in the database?

I don't fully understand what command-line compression means. Simon
suggested that this patch can be used with log-shipping and I agree.
If we compare compression with gzip or other general purpose
compression, compression ratio, CPU usage and I/O by pg_compresslog are
all quite better than those in gzip.

Please let me know if you intended defferently.

Regards;

--
-------------
Koichi Suzuki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Treat 2007-04-24 01:15:38 Re: [HACKERS] Wild idea: 9.0?
Previous Message Gregory Stark 2007-04-24 00:11:25 Re: Better error message for select_common_type()

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-04-24 02:42:38 Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect
Previous Message Heikki Linnakangas 2007-04-23 22:15:24 Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect