Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, "Joshua D(dot) Drake" <jd(at)commandprompt(dot)com>, Steve Kehlet <steve(dot)kehlet(at)gmail(dot)com>, Forums postgresql <pgsql-general(at)postgresql(dot)org>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Date: 2015-06-05 20:40:39
Message-ID: C06881F5-9AA9-42D2-8705-A8E6E971550E@anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

On June 5, 2015 10:02:37 PM GMT+02:00, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
>On Fri, Jun 5, 2015 at 2:47 PM, Andres Freund <andres(at)anarazel(dot)de>
>wrote:
>> On 2015-06-05 14:33:12 -0400, Tom Lane wrote:
>>> Robert Haas <robertmhaas(at)gmail(dot)com> writes:
>>> > 1. The problem that we might truncate an SLRU members page away
>when
>>> > it's in the buffers, but not drop it from the buffers, leading to
>a
>>> > failure when we try to write it later.
>>
>> I've got a fix for this, and about three other issues I found during
>> development of the new truncation codepath.
>>
>> I'll commit the fix tomorrow.
>
>OK. Then I think we should release next week, so we get the fixes we
>have out before PGCon. The current situation is not good.
>
>>> > I think we might want to try to fix one or both of those before
>>> > cutting a new release. I'm less sold on the idea of installing
>>> > WAL-logging in this minor release. That probably needs to be
>done,
>>> > but right now we've got stuff that worked in early 9.3.X release
>and
>>> > is now broken, and I'm in favor of fixing that first.
>>
>> I've implemented this, and so far it removes more code than it
>> adds. It's imo also a pretty clear win in how understandable the code
>> is. The remaining work, besides testing, is primarily going over
>lots
>> of comment and updating them. Some of them are outdated by the patch,
>> and some already were.
>>
>> Will post tonight, together with the other fixes, after I get back
>from
>> climbing.
>>
>> My gut feeling right now is that it's a significant improvement, and
>> that it'll be reasonable to include it. But I'd definitely like some
>> independent testing for it, and I'm not sure if that's doable in time
>> for the wrap.
>
>I think we would be foolish to rush that part into the tree. We
>probably got here in the first place by rushing the last round of
>fixes too much; let's try not to double down on that mistake.

My problem with that approach is that I think the code has gotten significantly more complex in the least few weeks. I have very little trust that the interactions between vacuum, the deferred truncations in the checkpointer, the state management in shared memory and recovery are correct. There's just too many non-local subtleties here.

I don't know what the right thing to do here is.

---
Please excuse brevity and formatting - I am writing this on my mobile phone.

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Tom Lane 2015-06-05 20:56:18 Re: [GENERAL] 9.4.1 -> 9.4.2 problem: could not access status of transaction 1
Previous Message Casey Deccio 2015-06-05 20:06:26 Re: alter column type

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2015-06-05 20:48:17 Re: Further issues with jsonb semantics, documentation
Previous Message Andrew Dunstan 2015-06-05 20:05:08 Re: Further issues with jsonb semantics, documentation