Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

From: Scott Mead <scott(at)scottrmead(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, pgsql-general(at)postgresql(dot)org, Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Date: 2010-11-17 15:38:24
Message-ID: AANLkTikU303jGY2a-AvizL=o-3hOn6SSHVj-FgLx-NLu@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Man, the number of misunderstandings in this thread is staggering.
> Let me try to explain what the proposed feature will and will not do.
>
> 1. The system catalog entries for all tables will be wal-logged.
> So schema (DDL) will survive a crash. There wouldn't be any way
> to make it not do that, because we can't wal-log only some updates
> to a particular table, and that includes the catalogs in particular.
>
> Gotcha

> 2. What's proposed as the new feature is that specific non-system
> tables can be marked as unlogged, meaning that WAL entries won't
> be made for changes in those tables' contents (nor their indexes'
> contents). So we can't guarantee that the contents of such tables
> will be correct or consistent after a crash. The proposed feature
> deals with this by forcibly truncating all such tables after a crash,
> thus ensuring that they're consistent though not populated. So the
> possible use-cases for such tables are limited to where (a) you can
> repopulate the tables on demand, or (b) you don't really care about
> losing data on a crash.
>

I would rather be allowed to decide that for myself.

>
> 3. There's a lot of wishful thinking here about what constitutes a
> crash. A backend crash *is* a crash, even if the postmaster keeps
> going. Data that had been in shared buffers doesn't get written out
> in such a scenario (and if we tried, it might be corrupt anyway). So
> unlogged tables would be corrupt and in need of truncation after such an
> event. Obviously, the same goes for an OS-level crash or power failure.
>

Right, just let *me* decide, that's all.

>
> 4. The last bit of discussion on -hackers concerned what to do in
> the case where the server got shut down cleanly. If it was shut
> down cleanly, then any data for unlogged tables would have been
> written out from shared buffers ... but did the data make it to disk?
> There's no easy way to know that. In the event of an OS crash or
> power failure shortly after server shutdown, it's possible that
> the unlogged tables would be corrupt. So Robert's initial proposal
> includes truncating unlogged tables at any database startup, even
> if the previous shutdown was clean. Some (including me) are arguing
> that that is unnecessarily strict; but you do have to realize that
> you're taking some risk with data validity if it doesn't do that.
>

It is too strict, it makes the feature barely more usable than a temp table.
As a DBA, I realize the implication of the feature:
*) b0rked indexes
*) b0rked data
*) Not knowing what's good and what's bad
*) Bad reports
*) Bad Bi

etc..., etc... etc...

Still, I'd rather be allowed to make the decision here. I think that
having the database try to enforce integrity on something i've marked as
'corruptable' (via the 'unlogged' flag) will be a constant fight between me
and the system. In the end, I'd just not use the feature.

> The bottom line here is that you really can only use the feature
> for data that you're willing to accept losing on no notice.
> Allowing the data to persist across clean shutdowns would probably
> improve usability a bit, but it's not changing that fundamental fact.
>

Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having
it truncated on server restart is useful for only a fraction of the
use-cases for this feature.

--Scott

>
> regards, tom lane
>

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Magnus Hagander 2010-11-17 15:39:37 Re: The first dedicated PostgreSQL forum
Previous Message Andreas Kretschmer 2010-11-17 15:37:28 Re: Storing old and new tuple values after an UPDATE, INSERT or DELETE