Re: making an unlogged table logged

From: Rob Wultsch <wultsch(at)gmail(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: making an unlogged table logged
Date: 2011-01-05 03:56:52
Message-ID: AANLkTi=V6F3La5fyqFbmCcU8r4Yi4j7W8=ex5=ucctCC@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 4, 2011 at 7:41 PM, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> Somebody asked about this on Depesz's blog today, and I think it's
> come up here before too, so I thought it might be worth my writing up
> a few comments on this.  I don't think I'm going to have time to work
> on this any time soon, but if someone else wants to work up a patch,
> I'm game to review.  I think it'd clearly be a good feature.
>
> Generally, to do this, it would be necessary to do the following
> things (plus anything I'm forgetting):
>
> 1. Take an AccessExclusiveLock on the target table.  You might think
> that concurrent selects could be allowed, but I believe that's not the
> case.  Read on.
>
> 2. Verify that there are no foreign keys referencing other unlogged
> tables, because if that were the case then after the change we'd have
> a permanent table referencing an unlogged table, which would violate
> referential integrity.  (Note that unlogged referencing permanent is
> OK, but permanent referencing unlogged is a no-no, so what matters
> when upgrading is "outbound" foreign keys.)
>
> 3. Write out all shared buffers for the target table, and drop them.
> This ensures that there are no buffers floating around for the target
> relation that are marked BM_UNLOGGED, which would be a bad thing.  Or
> maybe it's possible to just clear the BM_UNLOGGED flag, instead of
> dropping them.  This is the step that makes me think we need an access
> exclusive lock - otherwise, somebody else might read in a buffer and,
> seeing that the relation is unlogged (which is true, since we haven't
> committed yet), mark it BM_UNLOGGED.
>
> 4. fsync() any segments of the target relation - of any fork except
> that init fork - that might have dirty pages not on disk.
>
> 5. Arrange for the appropriate file deletions at commit or abort, by
> updating pendingDeletes.  On commit, we want to delete the init fork
> for the table and all its indexes.  On abort, we want to delete
> everything else, but only for pretend; that is, the abort record
> should reflect the deletions since they'll need to happen on any
> standbys, but we shouldn't actually perform them on the master since
> we don't want to obliterate the contents of the table for no reason.
> There's a subtle problem here I'm not quite sure how to deal with:
> what happens if we *crash* without writing an abort record?  It seems
> like that could leave a stray file around on a standby, because the
> current code only cleans things up on the standby at the start of
> recovery; to make this bullet-proof, I suppose it'd need to repeat
> that every time a crash happens on the master, but I don't know how to
> do that.  Note also that if wal_level is minimal, then we need only
> worry about the commit case; the abort case can be a no-op.
>
> 6. If wal_level != minimal, XLOG every page of every fork except the
> init fork, for both the table and the associated indexes. (Note that
> this step also requires an AccessExclusiveLock rather than some weaker
> lock, because of the arbitrary rule that only AccessExclusiveLocks are
> sent to standbys.  If we held only ShareRowExclusiveLock on the
> master, for example, a Hot Standby backend might read the table while
> it's only been half-copied.)
>
> 7. Update pg_class.relpersistence from 'u' to 'p', for both the table
> and the associated indexes.
>
> Going the other direction ought to be possible too, although it seems
> somewhat less useful.  For that, you'd need to flip around the check
> in step #2 (i.e. check for a reference FROM a permanent table),
> perform step #3, skip step #4, do step #5 backwards (create and log
> init forks, arranging for them to be removed on abort - this too has
> an issue with crashes that don't leave abort records behind); and
> arrange for the rest of the forks to be removed on commit on any
> standby without doing it on the master), skip step #6, and do step #7
> backwards.
>
> --
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company

A couple thoughts:
1. Could the making a table logged be a non-exclusive lock if the
ALTER is allowed to take a full checkpoint?
2. Unlogged to logged has giant use case.
3. In MySQL I have had to ALTER tables to engine BLACKHOLE because
they held data that was not vital, but the server was out of IO. Going
logged -> unlogged has a significant placed, I think.

--
Rob Wultsch
wultsch(at)gmail(dot)com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Shigeru HANADA 2011-01-05 03:58:36 Re: SQL/MED - core functionality
Previous Message Robert Haas 2011-01-05 03:24:20 Re: Streaming replication as a separate permissions