Re: [HACKERS] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Simon Riggs <simon(at)2ndquadrant(dot)com>, josh(at)agliodbs(dot)com, pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Full page writes improvement, code update
Date: 2007-04-11 01:09:43
Message-ID: 461C3557.6040201@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Hi,

In the case below, we run DBT-2 benchmark for one hour to get the
measure. Checkpoint occured three times (checkpoint interval was 20min).

For more information, when checkpoint interval is one hour, the amount
of the archived log size was as follows:
cp: 3.1GB
gzip: 1.5GB
pg_compresslog: 0.3GB

For both cases, database size was 12.7GB, relatively small.

As pointed out, if we don't run the checkpoint forever, the value for cp
will become close to that for pg_compresslog, but it is not practical.

The point here is, if we collect archive log with cp and the average
work load is a quarter of the full power, cp archiving will produce
about 0.8GB archive log per hour (for DBT-2 case, of course the size
depends on the nature of the transaction). If we run the database
whole day, the amount of the archive log will be as large as database
itself. After one week, archive log size gets seven times as large as
the database itself. This is the point. In production, such large
archive log will raise storage cost. The purpose of the proposal is
not to improve the performance, but to decrease the size of archive log
to save necessary storage, preserving the same chance of recovery at the
crash recovery as full_page_writes=on.

Because of DBT-2 nature, it is not meaningful to compare the throuput
(databsae size determines the number of transactions to run). Instead,
I compared the throuput using pgbench. These measures are: cp:
570tps, gzip:558tps, pg_compresslog: 574tps, negligible difference.

In terms of idle time for gzip and other command to archive WAL offline,
no difference in the environment was given other than the command to
archive. My guess is because the user time is very large in gzip, it
has more chance for scheduler to give resource to other processes. In
the case of cp, idle time is more than 30times longer than user time.
Pg_compresslog uses seven times longer idle time than user time. On the
other hand, gzip uses less idle time than user time. Considering the
total amount of user time, I think it's reasonable measure.

Again, in my proposal, it is not the issue to increase run time
performance. Issue is to decrease the size of archive log to save the
storage.

Regards;

Tom Lane wrote:
> Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp> writes:
>> My proposal is to remove unnecessary full page writes (they are needed
>> in crash recovery from inconsistent or partial writes) when we copy WAL
>> to archive log and rebuilt them as a dummy when we restore from archive
>> log.
>> ...
>> Benchmark: DBT-2
>> Database size: 120WH (12.3GB)
>> Total WAL size: 4.2GB (after 60min. run)
>> Elapsed time:
>> cp: 120.6sec
>> gzip: 590.0sec
>> pg_compresslog: 79.4sec
>> Resultant archive log size:
>> cp: 4.2GB
>> gzip: 2.2GB
>> pg_compresslog: 0.3GB
>> Resource consumption:
>> cp: user: 0.5sec system: 15.8sec idle: 16.9sec I/O wait: 87.7sec
>> gzip: user: 286.2sec system: 8.6sec idle: 260.5sec I/O wait: 36.0sec
>> pg_compresslog:
>> user: 7.9sec system: 5.5sec idle: 37.8sec I/O wait: 28.4sec
>
> What checkpoint settings were used to make this comparison? I'm
> wondering whether much of the same benefit can't be bought at zero cost
> by increasing the checkpoint interval, because that translates directly
> to a reduction in the number of full-page images inserted into WAL.
>
> Also, how much was the database run itself slowed down by the increased
> volume of WAL (due to duplicated information)? It seems rather
> pointless to me to measure only the archiving effort without any
> consideration for the impact on the database server proper.
>
> regards, tom lane
>
> PS: there's something fishy about the gzip numbers ... why all the idle
> time?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
> http://archives.postgresql.org
>

--
Koichi Suzuki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Joshua D. Drake 2007-04-11 01:17:38 Re: [HACKERS] Full page writes improvement, code update
Previous Message Mark Kirkwood 2007-04-11 00:50:06 Re: [mux@FreeBSD.org: Re: Anyone interested in improving postgresql scaling?]

Browse pgsql-patches by date

  From Date Subject
Next Message Joshua D. Drake 2007-04-11 01:17:38 Re: [HACKERS] Full page writes improvement, code update
Previous Message Tom Lane 2007-04-10 23:50:23 Re: [HACKERS] Fix mdsync never-ending loop problem