Re: XLog: how to log?

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: XLog: how to log?
Date: 2004-05-11 20:25:37
Message-ID: 1084307137.3028.2020.camel@stromboli
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, 2004-05-11 at 16:33, Bruce Momjian wrote:
> Tom Lane wrote:
> > Alvaro Herrera <alvherre(at)dcc(dot)uchile(dot)cl> writes:
> > > Hmm ... I think it should be forbidden to quote a subtrans Xid as
> > > rollforward point. Not sure if that can be done though, or how to do
> > > it.
> >
> > Seems like a nonissue, unless the XLOG trace makes a subtrans look the
> > same as a main trans, which it'd not do would it?
> >

I agree that a subtrans xid should not be a valid rollforward point.

Forgive me discussing what seems like obvious points - I'm sure you
appreciate we need an exact statement of how/when to terminate recovery
and that might be found in looking harder at the subtrans questions.
This is my third re-write of this e-mail, since I keep thinking of
additional things while going for the "definitive statement". I had
thought this was straightforward...

Currently, recovery loops until end of xlogs. There is no exit condition
from the loop. There is not currently a timestamp on the xlogs -
anywhere apart from the file date on each xlog.

Xids are assigned sequentially to transactions as they start. However,
Xids are not committed sequentially. Moreover, checkpoint records do not
wait for transactions to complete, so a checkpoint could record an Xid,
yet a lower Xid might still be in progress and commit sometime after the
checkpoint. So, when we do a backup, we might take with us a pg_control
that has a particular Xid, only to find lots of later committed, but
earlier Xids in the xlogs. So Xid can have no lower bound. (and a fully
formed clog is essential to recovery).

If we go searching for a particular Xid, there is no way to tell whether
an Xid suggested by a user is too big or too small for use as a recovery
target. We need to recover - it is the only way to tell; if we find an
Xid that matches, we stop. If not, we keep going until end of logs, when
we need to issue a "recovered fully - the Xid you gave was not valid",
which may take some time and is also very clearly not what was wanted.
(If they had wanted full recovery, they would have asked).

So searching on an Xid is inherently a poor way to recover. Which is a
shame, because it seemed like an easy target. Unless of course, we live
with this vagueness and get on and build the XLogSpy...

Xlog records ARE written sequentially, so a timestamp written to the
xlogs COULD be used as a target for halting recovery. We would be able
to decide, ahead of starting recovery, whether we would be able to
sensibly recover to that point by using the pg_control checkpoint time
as the lower bound and the file write times of the highest xlog as the
upper bound. Once decided that the target timestamp lies between upper
and lower bounds, we begin recovery, knowing exactly where it will
complete.

During recovery, we would search for a timestamp. If found exactly,
stop. If exceeded, stop. Any transactions not committed at that point
are, as we say, out of luck. ....This approach has a certainty about it
that I think is much better than the error prone Xid hunting approach,
and is also more attuned to the human reality (time matters, Xids
don't).

Earlier, Bruce and I had discussed that for reasons of time pressure,
the PITR code for this release would consist of
a) recovery to a particular Xid
b) later, a utility that allowed xlogs to be inspected to allow DBA to
decide which is the correct Xid to recover to.
Those ideas don't sound as good now....

Therefore: action on me? - add a timestamp to EACH xlog record -
something I had been shying away from.

On Tue, 2004-05-11 at 14:56, Alvaro Herrera wrote:
> (Unrelated: note that after main transaction commit, a committed
> subtransaction is indistinguishable from a committed main transaction --
> and with the current idea of XLog I have, after recovering a transaction
> tree from XLog there won't be any mark in pg_subtrans. So the system
> will not be exactly as it was before but it won't matter.)

I don't think we need a subtrans commit directly, since if the top-level
commits after the subtrans has committed, then we're good.

However, if a subtrans aborts, yet the top-level commits there will be
data written to the database about an aborted transaction. We don't have
Undo, so the subtrans clog must be updated to show that the subtrans
aborted, otherwise we would read both the committed (top-level) and the
uncommitted data (subtrans).

Another way of putting it - if it was worth writing before a crash, it
is worth recovering after a crash. Shurely?

> > We could allow specification of a subtrans ID to be interpreted the same
> > as specification of its parent main trans. Dunno if that's actually
> > useful to anyone. Actually, I'd think that people would generally
> > specify recovery up to a particular timestamp, and not be interested in
> > xact numbers at all ...
>
> I don't think timestamp is going to be precise enough. Basically I can
> see someone saying I want recovery up to 4am, but anything more specific
> will need xid. I suggested that we write an xlog dump tool so you can
> see the xids (with some xid details) and rough timestamps stored in the
> WAL file and choose the xid for recovery.

Bruce, As I started this e-mail (1st time), I completely agreed with
you. I've now had to switch my thinking.

(Doesn't effect archiving architecture....)

I'm a little dazed....comments anyone?

Best regards, Simon Riggs

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-05-11 20:29:15 Re: XLog: how to log?
Previous Message Bruce Momjian 2004-05-11 20:23:18 Re: Adding MERGE to the TODO list (resend with subject)