Performance and WAL on big inserts/updates

From: Marty Scholes <marty(at)outputservices(dot)com>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Performance and WAL on big inserts/updates
Date: 2004-03-11 21:49:59
Message-ID: 4050DF07.1040301@outputservices.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I combed the archives but could not find a discussion on this and am
amazed this hasn't been discussed.

My experience with Oracle (and now limited experience with Pg) is that
the major choke point in performance is not the CPU or read I/O, it is
the log performance of big update and select statements.

Essentially, the data is written twice: first to the log and then the
data files. This would be ok except the transaction is regularly frozen
while the log files sync to disk with a bunch of tiny (8KB for Oracle
and Pg) write requests.

I realize that the logs must be there to ensure crash recovery and that
PITR is on the way to supplement this.

If a transaction will do large updates or inserts, why don't we just log
the parsed statements in the WAL instead of the individual data blocks
that have changed? Then, the data files could be fsync()ed every
checkpoint, but essentially no write I/O takes places in the interim.

Outside of checkpoints, large updates would only need to fsync() a very
small addition to the log files.

Recovery could be similar to how I understand it currently is:
1. Roll back in-flight changes
2. Roll forward log entries in order, either direct changes to the data
or re-execute the parsed command in the log.

Some informal testing suggests that we get a factor of 8 improvement in
speed here if we completely disable fsync() in large updates under Pg.

I would suspect that a big portion of those gains would be preserved if
fsync() calls were limited to checkpoints and saving the parsed SQL
command in the log.

Why have I not seen this in any database?

There must be a reason.

Thanks in advance.

Sincerely,
Marty

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rod Taylor 2004-03-11 22:06:20 Re: Performance and WAL on big inserts/updates
Previous Message Neil Conway 2004-03-11 21:45:02 Re: unsafe floats