Re: [HACKERS] Full page writes improvement, code update

From: Josh Berkus <josh(at)agliodbs(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Cc: "Zeugswetter Andreas ADI SD" <ZeugswetterA(at)spardat(dot)at>, "Koichi Suzuki" <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-24 15:31:42
Message-ID: 200704240831.42873.josh@agliodbs.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Koichi, Andreas,

> 1) To deal with partial/inconsisitent write to the data file at crash
> recovery, we need full page writes at the first modification to pages
> after each checkpoint. It consumes much of WAL space.

We need to find a way around this someday. Other DBs don't do this; it may be
becuase they're less durable, or because they fixed the problem.

> I don't think there should be only one setting. It depend on how
> database is operated. Leaving wal_add_optiomization_info = off default
> does not bring any change in WAL and archive log handling. I
> understand some people may not be happy with additional 3% or so
> increase in WAL size, especially people who dosn't need archive log at
> all. So I prefer to leave the default off.

Except that, is there any reason to turn this off if we are archiving? Maybe
it should just be slaved to archive_command ... if we're not using PITR, it's
off, if we are, it's on.

> > 1) is there any throughput benefit for platforms with fast CPU but
> > contrained I/O (e.g. 2-drive webservers)? Any penalty for servers with
> > plentiful I/O?
>
> I've only run benchmarks with archive process running, because
> wal_add_optimization_info=on does not make sense if we don't archive
> WAL. In this situation, total I/O decreases because writes to archive
> log decreases. Because of 3% or so increase in WAL size, there will be
> increase in WAL write, but decrease in archive writes makes it up.

Yeah, I was just looking for a way to make this a performance feature. I see
now that it can't be. ;-)

> > 3) How is this better than command-line compression for log-shipping?
> > e.g. why do we need it in the database?
>
> I don't fully understand what command-line compression means. Simon
> suggested that this patch can be used with log-shipping and I agree.
> If we compare compression with gzip or other general purpose
> compression, compression ratio, CPU usage and I/O by pg_compresslog are
> all quite better than those in gzip.

OK, that answered my question.

> This is why I don't like Josh's suggested name of wal_compressable
> eighter.
> WAL is compressable eighter way, only pg_compresslog would need to be
> more complex if you don't turn off the full page optimization. I think a
> good name would tell that you are turning off an optimization.
> (thus my wal_fullpage_optimization on/off)

Well, as a PG hacker I find the name wal_fullpage_optimization quite baffling
and I think our general user base will find it even more so. Now that I have
Koichi's explanation of the problem, I vote for simply slaving this to the
PITR settings and not having a separate option at all.

--
Josh Berkus
PostgreSQL @ Sun
San Francisco

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message William Lawrance 2007-04-24 16:29:45 Re: [HACKERS] BUG #3244: problem with PREPARE
Previous Message Tom Lane 2007-04-24 15:06:52 Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-04-24 18:19:30 Re: [HACKERS] Full page writes improvement, code update
Previous Message Tom Lane 2007-04-24 15:06:52 Re: BUG #3245: PANIC: failed to re-find shared loc k o b j ect