Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Andres Freund <andres(at)anarazel(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-08 13:15:04
Message-ID: 20150608131504.GH24997@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On 2015-06-05 16:56:18 -0400, Tom Lane wrote:
> Andres Freund <andres(at)anarazel(dot)de> writes:
> > On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> >> I think we would be foolish to rush that part into the tree. We
> >> probably got here in the first place by rushing the last round of
> >> fixes too much; let's try not to double down on that mistake.
>
> > My problem with that approach is that I think the code has gotten significantly more complex in the least few weeks. I have very little trust that the interactions between vacuum, the deferred truncations in the checkpointer, the state management in shared memory and recovery are correct. There's just too many non-local subtleties here.
>
> > I don't know what the right thing to do here is.
>
> My gut feeling is that rushing to make a release date is the wrong thing.
>
> If we have confidence that we can ship something on Monday that is
> materially more trustworthy than the current releases, then let's aim to
> do that; but let's ship only patches we are confident in. We can do
> another set of releases later that incorporate additional fixes. (As some
> wise man once said, there's always another bug.)

I've tortured hardware a fair bit with HEAD. So far it looks much better
than 9.4.2+ et al. I've noticed a bunch of, to me at least, new issues:

1) the autovacuum trigger logic isn't perfect yet. I.e. especially with
autovacuum=off you can get into situations where emergency vacuums
aren't started when necessary. This is particularly likely to happen
if either very large multixacts are used, or if the server has been
shut down while emergency autovacuum where happening. No corruption
ensues, but it's not easy to get out of.

2) I've managed to corrupt a cluster when a standby performed
restartpoints less frequently than the master performed
checkpoints. Because truncations happen in the checkpointer it's not
that hard to end up with entirely full multixact slrus. This is a
problem on several fronts. We can IIUC end up truncating away the
wrong data, and we can be in a bad state upon promotion. None of that
is new.

3) It's really confusing that truncation (and thus the limits in shared
memory) happens in checkpoints. If you hit a limit and manually do all
the necessary vacuums you'll see a "good" limit in
pg_database.datminmxid, but you'll still into the error. You manually
have to force a checkpoint for the truncation to actually
happen. That's particularly problematic because larger installations,
where I presume wraparound issues are more likely, often have a large
checkpoint_timeout setting.

Since none of these are really new, I don't think they should prevent us
from doing a back branch release. While I'm still not convinced we're
better of with 9.4.4 than with 9.4.1, we're certainly better of than
with 9.4.[23] et al.

If we want to go ahead with the release I plan to do a bit more testing
today and tomorrow. If not I'm first going to continue working on fixing
the above.

I've a "good" fix for 1). I'm not 100% sure I'll feel confident with
pushing if we wrap today. I am wondering if we shouldn't at least apply
the portion that unconditionally sends a signal in the ERROR
case. That's still an improvement.

One more thing:
Our testing infrastructure sucks. Without writing C code it's basically
impossible to test wraparounds and such. Even if not particularly useful
for non-devs, I really think we should have functions for creating
burning xids/multixacts in core. Or at least in some extension.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message otheus uibk 2015-06-08 13:26:47 Re: pg_start_backup does not actually allow for consistent, file-level backup
Previous Message otheus uibk 2015-06-08 13:13:51 Re: pg_start_backup does not actually allow for consistent, file-level backup

Browse pgsql-hackers by date

  From Date Subject
Next Message Geoff Winkless 2015-06-08 13:21:31 Re: [CORE] Restore-reliability mode
Previous Message Andrew Dunstan 2015-06-08 13:09:27 Re: Re: [COMMITTERS] pgsql: Map basebackup tablespaces using a tablespace_map file