Re: [CORE] postpone next week's release

From: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
To: Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Stephen Frost <sfrost(at)snowman(dot)net>, Magnus Hagander <magnus(at)hagander(dot)net>, "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>, pgsql-core <pgsql-core(at)postgresql(dot)org>
Subject: Re: [CORE] postpone next week's release
Date: 2015-06-04 08:51:44
Message-ID: 557011A0.6070604@iki.fi
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 05/30/2015 11:47 PM, Andres Freund wrote:
> I don't think it's primarily a problem of lack of review; although that
> is a large problem. I think the biggest systematic problem is that the
> compound complexity of postgres has increased dramatically over the
> years. Features have added complexity little by little, each not
> incrementally not looking that bad. But very little has been done to
> manage complexity. Since 8.0 the codesize has roughly doubled, but
> little has been done to manage the increased complexity. Few new
> abstractions have been introduced and the structure of the code is
> largely the same.
>
> As a somewhat extreme example, let's look at StartupXLOG(). In 8.0 it
> was ~500 LOC, in master it's ~1500. The interactions in 8.0 were
> complex, they have gotten much more complex since. It fullfills lots of
> different roles, all in one function:
>
> (roughly in the order things happen, but simplified)
> * Read the control file/determine whether we crashed
> * recovery.conf handling
> * backup label handling
> * tablespace map handling (huh, I missed that this was added directly to
> StartupXLOG. What a bad idea)
> * Determine whether we're doing archive recovery, read the relevant
> checkpoint if so
> * relcache init file removal
> * timeline switch handling
> * Loading the checkpoint we're starting from
> * Initialization of a lot of subsystems
> * crash recovery/replay
> * Including pgstat, unlogged table, exported snapshot handling
> * iff hot standby, some more subsystems are initialized here
> * hot standby state handling
> * replay process intialization
> * crash replay itself, including
> * progress tracking
> * recovery pause handling
> * nextxid tracking
> * timeline increase handling
> * hot standby state handling
> * unlogged relations handling
> * archive recovery handling
> * creation/initialization of the end of recovery checkpoint
> * timeline increment if failover
> * subsystem initialization iff !hot_standby
> * end of recovery actions
>
> Yes. that's one routine. And, to make things even funnier, half of that
> routine isn't exercised by our tests.
>
> You can argue that this is an outlier, but I don't think so. Heapam, the
> planner, etc. have similar cases.
>
> And I think this, to some degree, explains a lot of the multixact
> problems. While there were a few "simple bugs", most of them were
> interactions between the various subsystems that are rather intricate.

I think this explanation is wrong. I agree that there are many places
that would be good to refactor - like StartupXLOG() - but the multixact
code was not too bad in that regard. IIRC the patch included some
refactoring, it added some new helper functions in heapam.c, for
example. You can argue that it didn't do enough of it, but that was not
the big issue.

The big issue was at the architecture level. Basically, we liked
vacuuming of XIDs and clog so much that we decided that it'd be nice if
you had to vacuum multixids too, in order to not lose data. Many of the
bugs and issues were not new - we had multixids before - but we upped
the ante and turned minor locking bugs into data loss. And that had
nothing to do with the code structure - we'd have similar issues if we
had rewritten everything java, with the same design.

So, I'm all for refactoring and adding abstractions where it makes
sense, but it's not going to solve design problems.

- Heikki

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2015-06-04 09:17:42 Re: [CORE] postpone next week's release
Previous Message Jeevan Chalke 2015-06-04 08:25:52 Re: [PATCH] two-arg current_setting() with fallback