Re: [PATCHES] Full page writes improvement, code update

From: Koichi Suzuki <suzuki(dot)koichi(at)oss(dot)ntt(dot)co(dot)jp>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org, pgsql-patches(at)postgresql(dot)org
Subject: Re: [PATCHES] Full page writes improvement, code update
Date: 2007-03-28 01:54:11
Message-ID: 4609CAC3.7060008@oss.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

Simon;

Thanks a lot for your comments/advices. I'd like to write some feedback.

Simon Riggs wrote:
> On Tue, 2007-03-27 at 11:52 +0900, Koichi Suzuki wrote:
>
>> Here's an update of a code to improve full page writes as proposed in
>>
>> http://archives.postgresql.org/pgsql-hackers/2007-01/msg01491.php
>> and
>> http://archives.postgresql.org/pgsql-patches/2007-01/msg00607.php
>>
>> Update includes some modification for error handling in archiver and
>> restoration command.
>>
>> In the previous threads, I posted several evaluation and shown that we
>> can keep all the full page writes needed for full XLOG crash recovery,
>> while compressing archive log size considerably better than gzip, with
>> less CPU consumption. I've found no further objection for this proposal
>> but still would like to hear comments/opinions/advices.
>
> Koichi-san,
>
> Looks interesting. I like the small amount of code to do this.
>
> A few thoughts:
>
> - Not sure why we need "full_page_compress", why not just mark them
> always? That harms noone. (Did someone else ask for that? If so, keep
> it)

No, no one asked to have a separate option. There'll be no bad
influence to do so. As written below, full page write can be
categolized as follows:

1) Needed for crash recovery: first page update after each checkpoint.
This has to be kept in WAL.

2) Needed for archive recovery: page update between pg_start_backup and
pg_stop_backup. This has to be kept in archive log.

3) For log-shipping slave such as pg_standby: no full page writes will
be needed for this purpose.

My proposal deals with 2). So, if we mark each "full_page_write", I'd
rather mark when this is needed. Still need only one bit because the
case 3) does not need any mark.

> - OTOH I'd like to see an explicit parameter set during recovery since
> you're asking the main recovery path to act differently in case a single
> bit is set/unset. If you are using that form of recovery, we should say
> so explicitly, to keep everybody else safe.

Only one thing I had to do is to create "dummy" full page write to
maintain LSNs. Full page writes are omitted in archive log. We have to
LSNs same as those in the original WAL. In this case, recovery has to
read logical log, not "dummy" full page writes. On the other hand, if
both logical log and "real" full page writes are found in a log record,
the recovery has to use "real" full page writes.

> - I'd rather mark just the nonremovable blocks. But no real difference

It sound nicer. According to the full page write categories above, we
can mark full page writes as needed in "crash recovery" or "archive
recovery". Please give some feedback to the above full page write
categories.

>
> - We definitely don't want an normal elog in a XLogInsert critical
> section, especially at DEBUG1 level

I agree. I'll remove elog calls from critical sections.

> - diff -c format is easier and the standard

I'll change the diff option.

> - pg_logarchive and pg_logrestore should be called by a name that
> reflects what they actually do. Possibly pg_compresslog and
> pg_decompresslog etc.. I've not reviewed those programs, but:

I wasn't careful to have command names more based on what to be done.
I'll change the command name.

>
> - Not sure why we have to compress away page headers. Touch as little as
> you can has always been my thinking with recovery code.

Eliminating page headers gives compression rate slightly better, a
couple of percents. To make code simpler, I'll drop this compression.

>
> - I'm very uncomfortable with touching the LSN. Maybe I misunderstand?

The reason why we need pg_logarchive (or pg_decompresslog) is to
maintain LSN the same as those in the original WAL. For this purpose,
we need "dummy" full page write. I don't like to touch LSN either and
this is the reason why pg_decompresslog need some extra work,
eliminating many other changes in the recovery.

>
> - Have you thought about how pg_standby would integrate with this
> option? Can you please?

Yes I believe so. As pg_standby does not include any chance to meet
partial writes of pages, I believe you can omit all the full page
writes. Of course, as Tom Lange suggested in
http://archives.postgresql.org/pgsql-hackers/2007-02/msg00034.php
removing full page writes can lose a chance to recover from
partial/inconsisitent writes in the crash of pg_standby. In this case,
we have to import a backup and archive logs (with full page writes
during the backup) to recover. (We have to import them when the file
system crashes anyway). If it's okay, I believe
pg_compresslog/pg_decompresslog can be integrated with pg_standby.

Maybe we can work together to include pg_compresslog/pg_decompresslog in
pg_standby.

>
> - I'll do some docs for this after Freeze, if you'd like. I have some
> other changes to make there, so I can do this at the same time.

I'll be looking forward to your writings.

Best regards;

--
Koichi Suzuki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2007-03-28 02:15:13 Re: Warning on contrib/tsearch2
Previous Message Dezso Zoltan 2007-03-28 01:44:00 Re: Server-side support of all encodings

Browse pgsql-patches by date

  From Date Subject
Next Message ITAGAKI Takahiro 2007-03-28 05:47:12 Re: O_DIRECT support for Windows
Previous Message ITAGAKI Takahiro 2007-03-28 01:23:09 Small code clean-up