Re: getting rid of freezing

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Andres Freund <andres(at)2ndquadrant(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: getting rid of freezing
Date: 2013-05-24 02:09:02
Message-ID: CA+TgmoZMAPbJ554JuT68jGM4Ye3TeMUJGE3=VaCBDGKxAdh0Jw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, May 23, 2013 at 1:51 PM, Andres Freund <andres(at)2ndquadrant(dot)com> wrote:
> So, what I propose instead is basically:
> 1) only vacuum non-all-visible pages, even when doing it for
> anti-wraparound

Check. We might want an option to force a scan of the whole relation.

> 2) When we can set all-visible guarantee that all tuples on the page are
> fully hinted. During recovery do the same, so we don't need to log
> all hint bits.
> We can do this with only an exclusive lock on the buffer, we don't
> need a cleanup lock.

I don't think this works. Emitting XLOG_HEAP_VISIBLE for a heap page
does not emit an FPI for the heap page, only (if needed) for the
visibility map page. So a subsequent crash that tears the page could
keep XLOG_HEAP_VISIBLE but lose other changes on the page - i.e. the
hint bits.

> 3) When we cannot mark a page all-visible or we cannot get the cleanup
> lock, remember the oldest xmin on that page. We could set all visible
> in the former case, but we want the page to be cleaned up sometime
> soonish.

I think you mean "in the latter case" not "in the former case". If
not, then I'm confused.

> 4) If we can get the cleanup lock, purge dead tuples from the page and
> the indexes, just as today. Set the page as all-visible.
>
> That way we know that any page that is all-visible doesn't ever need to
> look at xmin/xmax since we are sure to have set all relevant hint
> bits.
>
> We don't even necessarily need to log the hint bits for all items since
> the redo for all_visible could make sure all items are hinted. The only
> problem is knowing up to where we can truncate pg_clog...

The redo for all_visible cannot make sure all items are hinted.
Again, there's no FPI on the heap page. The heap page could in fact
contain dead tuples at the time we mark it all-visible. Consider, for
example:

0. Checkpoint.
1. The buffer becomes all visible.
2. A tuple is inserted, making the buffer not-all-visible.
3. The page is written by the OS.
4. Crash.

Now, recovery will first find the record marking the buffer
all-visible, and will mark it all-visible. Now the all-visible bit on
the page is flat-out wrong, but it doesn't matter because we haven't
reached consistency. Next we'll find the heap-insert record, which
will have an FPI, since it's the first WAL-logged change to the buffer
since the last checkpoint. Now the FPI fixes everything and we're
back in a sane state.

Now in this particular case it wouldn't hurt anything if the redo
routine that set the all-visible bit also hinted all the tuples,
because the FPI is going to overwrite it anyway. But suppose in lieu
of steps (3) and (4) we write half of the page and then crash, leaving
behind a torn page. Now it's pretty crazy to think about trying to
hint tuples; the page may be in a completely insane state.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fabrízio de Royes Mello 2013-05-24 02:44:52 Patch to .gitignore
Previous Message Heikki Linnakangas 2013-05-24 01:33:22 Re: Block write statistics WIP