Re: Survey on backing up unlogged tables: help us with PostgreSQL development!

From: Scott Mead <scott(at)scottrmead(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Glen Parker <glenebob(at)nwlink(dot)com>, pgsql-general(at)postgresql(dot)org, Karsten Hilbert <Karsten(dot)Hilbert(at)gmx(dot)net>
Subject: Re: Survey on backing up unlogged tables: help us with PostgreSQL development!
Date: 2010-11-17 15:41:11
Message-ID: AANLkTikG+KpKMO8Y=z7YftO8HT8K8PLcREfVg7+887kR@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

On Wed, Nov 17, 2010 at 10:38 AM, Scott Mead <scott(at)scottrmead(dot)com> wrote:

> On Tue, Nov 16, 2010 at 10:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
>> Man, the number of misunderstandings in this thread is staggering.
>> Let me try to explain what the proposed feature will and will not do.
>>
>> 1. The system catalog entries for all tables will be wal-logged.
>> So schema (DDL) will survive a crash. There wouldn't be any way
>> to make it not do that, because we can't wal-log only some updates
>> to a particular table, and that includes the catalogs in particular.
>>
>> Gotcha
>
>
>> 2. What's proposed as the new feature is that specific non-system
>> tables can be marked as unlogged, meaning that WAL entries won't
>> be made for changes in those tables' contents (nor their indexes'
>> contents). So we can't guarantee that the contents of such tables
>> will be correct or consistent after a crash. The proposed feature
>> deals with this by forcibly truncating all such tables after a crash,
>> thus ensuring that they're consistent though not populated. So the
>> possible use-cases for such tables are limited to where (a) you can
>> repopulate the tables on demand, or (b) you don't really care about
>> losing data on a crash.
>>
>
> I would rather be allowed to decide that for myself.
>
>
>>
>> 3. There's a lot of wishful thinking here about what constitutes a
>> crash. A backend crash *is* a crash, even if the postmaster keeps
>> going. Data that had been in shared buffers doesn't get written out
>> in such a scenario (and if we tried, it might be corrupt anyway). So
>> unlogged tables would be corrupt and in need of truncation after such an
>> event. Obviously, the same goes for an OS-level crash or power failure.
>>
>
> Right, just let *me* decide, that's all.
>
>
>>
>> 4. The last bit of discussion on -hackers concerned what to do in
>> the case where the server got shut down cleanly. If it was shut
>> down cleanly, then any data for unlogged tables would have been
>> written out from shared buffers ... but did the data make it to disk?
>> There's no easy way to know that. In the event of an OS crash or
>> power failure shortly after server shutdown, it's possible that
>> the unlogged tables would be corrupt. So Robert's initial proposal
>> includes truncating unlogged tables at any database startup, even
>> if the previous shutdown was clean. Some (including me) are arguing
>> that that is unnecessarily strict; but you do have to realize that
>> you're taking some risk with data validity if it doesn't do that.
>>
>
> It is too strict, it makes the feature barely more usable than a temp
> table.
> As a DBA, I realize the implication of the feature:
> *) b0rked indexes
> *) b0rked data
> *) Not knowing what's good and what's bad
> *) Bad reports
> *) Bad Bi
>
> etc..., etc... etc...
>
> Still, I'd rather be allowed to make the decision here. I think that
> having the database try to enforce integrity on something i've marked as
> 'corruptable' (via the 'unlogged' flag) will be a constant fight between me
> and the system. In the end, I'd just not use the feature.
>
>
>> The bottom line here is that you really can only use the feature
>> for data that you're willing to accept losing on no notice.
>> Allowing the data to persist across clean shutdowns would probably
>> improve usability a bit, but it's not changing that fundamental fact.
>>
>
> Agreed, and that's fine. IMHO, it improves the usability 10 fold. Having
> it truncated on server restart is useful for only a fraction of the
> use-cases for this feature.
>

Now that I've just sent that last piece, what about a 'truncate on restart'
option that is defaulted to on? That way, the community feels good knowing
that we're trying to protect people from themselves, but like the 'fsync'
feature, I can load the gun and pull the trigger if I really want to. I'd
like to see that so even if there is a server crash, it doesn't truncate.
That way, i can rename the garbage table if I want, create a new one for all
new data and then be allowed to glean what I can from the last one.

--Scott

>
> --Scott
>
>
>>
>> regards, tom lane
>>
>
>

In response to

Browse pgsql-general by date

  From Date Subject
Next Message Pavel Stehule 2010-11-17 15:46:19 Re: median for postgresql 8.3
Previous Message Magnus Hagander 2010-11-17 15:39:37 Re: The first dedicated PostgreSQL forum