Re: PITR Phase 2 - Design Planning

From: Hans-Jürgen Schönig <postgres(at)cybertec(dot)at>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: PITR Phase 2 - Design Planning
Date: 2004-04-27 07:56:25
Message-ID: 408E1229.3010603@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Simon Riggs wrote:
> Since Phase1 is functioning and should hopefully soon complete, we can
> now start thinking about Phase 2: full recovery to a point-in-time.
>
> Previous thinking was that a command line switch would be used to
> specify recover to a given point in time, rather than the default, which
> will be recover all the way to end of (available) xlogs.
>
> Recovering to a specific point in time forces us to consider what the
> granularity is of time.
> We could recover:
> 1.to end of a full transaction log file
> 2.to end of a full transaction
>
> Transaction log files currently have timestamps, so that is
> straightforward, but probably not the best we can do. We would
> rollforward until the xlog file time > desired point in time.
>
> To make (2) work we would have to have a timestamp associated with each
> transaction. This could be in one of two places:
> 1. the transaction record in the clog
> 2. the log record in the xlog
> We would then recover the xlog record by record, until we found a record
> that had a timestamp > desired point-in-time.
>
> Currently, neither of these places have a timestamp. Hmmmm. We can't use
> pg_control because we are assuming that it needs recovery...
>
> I can't see any general way of adding a timestamp in any less than 2
> bytes. We don't need a timezone. The timestamp could refer to a number
> of seconds since last checkpoint; since this is limited already by a GUC
> to force checkpoints every so often. Although code avoids a checkpoint
> if no updates have taken place, we wouldn't be too remiss to use a
> forced checkpoint every 32,000 seconds (9 hours).
> Assuming that accuracy of the point-in-time was of the order of
> seconds?? If we went to 0.1 second accuracy, we could checkpoint (force)
> every 40 minutes or so. All of that seems too restrictive.
> If we went to milliseconds, then we could use a 4 byte value and use a
> checkpoint (force) every 284 hours or 1.5 weeks.
> Thoughts?
>
> Clog uses 2 bits per transaction, so even 2 bytes extra per transaction
> will make the clog 9 times larger than originally intended. This could
> well cause it to segment quicker, but I'm sure no one would be happy
> with that. So, lets not add anything to the clog.
>
> The alternative is to make the last part of the XlogHeader record a
> timestamp value, increasing each xlog write. It might be possible to
> make this part of the header optional depending upon whether or not PITR
> was required, but then my preference is against such dynamic coding.
>
> So, I propose:
>
> - appending 8 byte date/time data into xlog file header record
> - appending 4 bytes of time offset onto each xlog record
> - altering the recovery logic to compare the calculated time of each
> xlog record (file header + offset) against the desired point-in-time,
> delivered to it by GUC.
>
> Input is sought from anybody with detailed NTP knowledge, since the
> working of NTP drift correction may have some subtle interplay with this
> proposal.
>
> Also, while that code is being altered, some additional log records need
> to be added when recovery of each new xlog starts, with timing, to allow
> DBAs watching a recovery to calculate expected completion times for the
> recovery, which is essential for long recovery situations.
>
> I am also considering any changes that may be required to prepare the
> way for a future implementation of parallel redo recovery.
>
> Best regards, Simon Riggs, 2ndQuadrant
> http://www.2ndquadrant.com
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend

Simon,

I have one question which might be important: If we use timestamps
inside the WAL system to find out where to stop. What happens if
somebody changes the time of the system? (e.g. correcting the system
clock by calling ntpdate). Wouldn't it confuse the PITR system? How do
you plan to handle that? Unfortunately time is nothing which can be used
as a key (at least not from my point of view).

Just some lousy ideas early in the morning ...

Regards,

Hans

--
Cybertec Geschwinde u Schoenig
Schoengrabern 134, A-2020 Hollabrunn, Austria
Tel: +43/2952/30706 or +43/664/233 90 75
www.cybertec.at, www.postgresql.at, kernel.cybertec.at

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Karel Zak 2004-04-27 07:59:54 Re: [HACKERS] What can we learn from MySQL?
Previous Message Thomas Hallgren 2004-04-27 07:47:47 Re: Usability, MySQL, Postgresql.org, gborg, contrib, etc.