Re: Large C files

From: Bruce Momjian <bruce(at)momjian(dot)us>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Peter Geoghegan <peter(at)2ndquadrant(dot)com>, Greg Stark <stark(at)mit(dot)edu>, Heikki Linnakangas <heikki(dot)linnakangas(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: Large C files
Date: 2011-09-24 13:30:18
Message-ID: 201109241330.p8ODUIO15811@momjian.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Robert Haas wrote:
> On Fri, Sep 9, 2011 at 11:28 PM, Peter Geoghegan <peter(at)2ndquadrant(dot)com> wrote:
> > It's very difficult or impossible to anticipate how effective the tool
> > will be in practice, but when you consider that it works and does not
> > produce false positives for the first 3 real-world cases tested, it
> > seems reasonable to assume that it's at least worth having around. Tom
> > recently said of a previous pgrminclude campaign in July 2006 that "It
> > took us two weeks to mostly recover, but we were still dealing with
> > some fallout in December". I think that makes the case for adding this
> > tool or some refinement as a complement to pgrminclude in src/tools
> > fairly compelling.
>
> I'm not opposed to adding something like this, but I think it needs to
> either be tied into the actual running of the script, or have a lot
> more documentation than it does now, or both. I am possibly stupid,
> but I can't understand from reading the script (or, honestly, the
> thread) exactly what kind of pgrminclude-induced errors this is
> protecting against; but even if we clarify that, it seems like it
> would be a lot of work to run it manually on all the files that might
> be affected by a pgrminclude run, unless we can somehow automate that.
>
> I'm also curious to see how much more fallout we're going to see from
> that run. We had a few glitches when it was first done, but it didn't
> seem like they were really all that bad. It might be that we'd be
> better off running pgrminclude a lot *more* often (like once a cycle,
> or even after every CommitFest), because the scope of the changes
> would then be far smaller and we wouldn't be dealing with 5 years of
> accumulated cruft all at once; we'd also get a lot more experience
> with what works or does not work with the script, which might lead to
> improvements in that script on a less-than-geologic time scale.

Interesting idea. pgrminclude has three main problems:

o defines --- to prevent removing an include that is referenced in an
#ifdef block that is not reached, it removes ifdef blocks, though that
might cause the file not to compile and be skipped.

o ## expansion --- we found that CppAsString2() uses the CCP expansion
##, which throws no error if the symbol is does not exist (perhaps
through the removal of a define). This is the problem we had with
tablespace directory names, and the script now checks for CppAsString2
and friends and skips the file.

o imbalance of includes --- pgrminclude can remove includes that are
not required, but this can cause other files to need many more includes.
This is the imbalance include problem Tom found.

The submitted patch to compare object files only catches the second
problem we had.

I think we determined that the best risk/reward is to run it every five
years. The pgrminclude run removed 627 include references, which is
0.006% of our 1,077,878 lines of C code.

--
Bruce Momjian <bruce(at)momjian(dot)us> http://momjian.us
EnterpriseDB http://enterprisedb.com

+ It's impossible for everything to be true. +

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Martijn van Oosterhout 2011-09-24 13:45:26 Re: Re: memory barriers (was: Yes, WaitLatch is vulnerable to weak-memory-ordering bugs)
Previous Message Simon Riggs 2011-09-24 13:07:16 Re: unite recovery.conf and postgresql.conf