> Torn pages (partial page write) are still a problem.
I revised the idea with MINIMAL XLOG (instead of WITHOUT XLOG) like the
below. I think in this way, we can always gaurantee its correctness and
can always improve it.
To Use It
A "BEGIN TRANSACTION MINIMAL XLOG/END" block is a speicial "BEGIN/END"
transaction block. It tries to avoid unnessary xlogs but still perserves
transaction semantics. It is good for the situation that the user wants to
do a big data load. It is issued like this:
1. BEGIN TRANSACTION MINIMAL XLOG
2. ... /* statements */
From user's view, it is almost the same as ordinary transaction: if
everything run smoothly from step 1 to 3, the transaction will be made
durable. If any step failed (including ABORT, transaction error, system
crash), it looks like nothing happened. To make life easier, no
subtransactions is allowed.
To Implement It
At step 1, we will disallow some operations, including vacuum, PITR.
At step 2, only minimal xlog entries are logged. If anything inside
failed, handle it like ordinary transaction.
At step 3, we issue a checkpoint, then mark the transaction commited. If
step 8 itself failed, handle it like ordinary transaction.
The correctness is easy: if we let "minimal xlog" equal to "all xlog",
then it is exactly the same as an ordinary transaction plus a checkpoint
inside the transaction block.
Based on the above proof, we can have the following implementation steps:
1. We first make the framework without revising any XLogInsert() - thus
the implementation is correct;
2. Examine each XLogInsert() and differenciate the content under MINIAML
XLOG is set or not.
The importance of the above steps is that it implies that there is no need
to completely pick up what are the MINIAL XLOG content are, we can do them
gradually in a non-invasive way.
The xlog of failed transaction is not totally useless since later
transaction may reply on something it creates - for example, a new page
and its links of a btree. We have to pick up these xlogs.
RM_HEAP_ID: The problem of heap is torn page prevention. We currently copy
the whole page into xlog if it is the first time touched after a
checkpoint. So we can always have this copy to replace the data file page
which might be torn written. I didn't come up with any good method to
handle it so far, so we keep this. (We can possibly avoid copy a P_NEW
page, that's another story though). So what we can avoid xlog at least
include the insert/update/delete happened on a page that's no need to be
copied, which will give us a 50% xlog volumn/contention reduction I think.
RM_BTREE_ID/RM_HASH_ID/RM_GIST_ID: For index, things get more complex. We
need the xlogs to maintain the structure of the btree index, like the
pointers, high key etc, but the content is not necessarily needed. Need
more research here.
It is hard to avoid much here, but they are not the important volume
contribution of xlogs.
In response to
pgsql-hackers by date
|Next:||From: Tom Lane||Date: 2005-12-24 03:00:39|
|Subject: Re: [Bizgres-general] WAL bypass for INSERT, UPDATE and |
|Previous:||From: Andrew Dunstan||Date: 2005-12-23 23:58:22|
|Subject: Re: [PATCHES] default resource limits|