Re: PITR, checkpoint, and local relations

From: Bruce Momjian <pgman(at)candle(dot)pha(dot)pa(dot)us>
To: "J(dot) R(dot) Nield" <jrnield(at)usol(dot)com>
Cc: PostgreSQL Hacker <pgsql-hackers(at)postgresql(dot)org>, Richard Tucker <richt(at)multera(dot)com>
Subject: Re: PITR, checkpoint, and local relations
Date: 2002-08-01 21:14:04
Message-ID: 200208012114.g71LE4l00726@candle.pha.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


J.R needs comments on this. PITR has problems because local relations
aren't logged to WAL. Suggestions?

---------------------------------------------------------------------------

J. R. Nield wrote:
> As per earlier discussion, I'm working on the hot backup issues as part
> of the PITR support. While I was looking at the buffer manager and the
> relcache/MyDb issues to figure out the best way to work this, it
> occurred to me that PITR will introduce a big problem with the way we
> handle local relations.
>
> The basic problem is that local relations (rd_myxactonly == true) are
> not part of a checkpoint, so there is no way to get a lower bound on the
> starting LSN needed to recover a local relation. In the past this did
> not matter, because either the local file would be (effectively)
> discarded during recovery because it had not yet become visible, or the
> file would be flushed before the transaction creating it made it
> visible. Now this is a problem.
>
> So I need a decision from the core team on what to do about the local
> buffer manager. My preference would be to forget about the local buffer
> manager entirely, or if not that then to allow it only for _true_
> temporary data. The only alternative I can devise is to create some way
> for all other backends to participate in a checkpoint, perhaps using a
> signal. I'm not sure this can be done safely.
>
> Anyway, I'm glad the tuplesort stuff doesn't try to use relation files
> :-)
>
> Can the core team let me know if this is acceptable, and whether I
> should move ahead with changes to the buffer manager (and some other
> stuff) needed to avoid special treatment of rd_myxactonly relations?
>
> Also to Richard: have you guys at multera dealt with this issue already?
> Is there some way around this that I'm missing?
>
>
> Regards,
>
> John Nield
>
>
>
>
> Just as an example of this problem, imagine the following sequence:
>
> 1) Transaction TX1 creates a local relation LR1 which will eventually
> become a globally visible table. Tuples are inserted into the local
> relation, and logged to the WAL file. Some tuples remain in the local
> buffer cache and are not yet written out, although they are logged. TX1
> is still in progress.
>
> 2) Backup starts, and checkpoint is called to get a minimum starting LSN
> (MINLSN) for the backed-up files. Only the global buffers are flushed.
>
> 3) Backup process copies LR1 into the backup directory. (postulate some
> way of coordinating with the local buffer manager, a problem I have not
> solved).
>
> 4) TX1 commits and flushes its local buffers. A dirty buffer exists
> whose LSN is before MINLSN. LR1 becomes globally visible.
>
> 5) Backup finishes copying all the files, including the local relations,
> and then flushes the log. The log files between MINLSN and the current
> LSN are copied to the backup directory, and backup is done.
>
> 6) Sometime later, a system administrator restores the backup and plays
> the logs forward starting at MINLSN. LR1 will be corrupt, because some
> of the log entries required for its restoration will be before MINLSN.
> This corruption will not be detected until something goes wrong.
>
> BTW: The problem doesn't only happen with backup! It occurs at every
> checkpoint as well, I just missed it until I started working on the hot
> backup issue.
>
> --
> J. R. Nield
> jrnield(at)usol(dot)com
>
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo(at)postgresql(dot)org)
>

--
Bruce Momjian | http://candle.pha.pa.us
pgman(at)candle(dot)pha(dot)pa(dot)us | (610) 853-3000
+ If your life is a hard drive, | 830 Blythe Avenue
+ Christ can be your backup. | Drexel Hill, Pennsylvania 19026

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bruce Momjian 2002-08-01 21:16:33 Re: Trimming the Fat, Part Deux ...
Previous Message Marc G. Fournier 2002-08-01 21:13:52 Re: Open 7.3 items