Re: Large C files

From: Peter Geoghegan <peter(at)2ndquadrant(dot)com>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Bruce Momjian <bruce(at)momjian(dot)us>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Peter Eisentraut <peter_e(at)gmx(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Jan Urbański <wulczer(at)wulczer(dot)org>
Subject: Re: Large C files
Date: 2011-09-08 20:45:56
Message-ID: CAEYLb_WzY_1f+8PCzLxtyd4MNVoEiBC9KxPX97+J5qV-7nX2uQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 8 September 2011 15:43, Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> I wouldn't be too enthusiastic about
> starting a project like this in January, but now seems fine.  A bigger
> problem is that I don't hear anyone volunteering to do the work.

You seem to have a fairly strong opinion on the xlog.c code. It would
be useful to hear any preliminary thoughts that you might have on what
we'd end up with when this refactoring work is finished. If I'm not
mistaken, you think that it is a good candidate for being refactored
not so much because of its size, but for other reasons - could you
please elaborate on those? In particular, I'd like to know what
boundaries it is envisaged that the code should be refactored along to
increase its conceptual integrity, or to better separate concerns. I
assume that that's the idea, since each new .c file would have to have
a discrete purpose.

On 7 September 2011 19:12, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> The pgrminclude-induced bug you just fixed shows a concrete way in which
> moving code from one file to another might silently break it, ie, it
> still compiles despite lack of definition of some symbol it's intended
> to see.

Of course, the big unknown here is the C preprocessor. There is
nothing in principle that stops a header file from slightly or utterly
altering the semantics of any file it is included in. That said, it
wouldn't surprise me if the nm-diff shell script I proposed (or a
slight variant intended to catch pgrminclude type bugs) caught many of
those problems in practice.

Attached is simple POC nm-diff.sh, intended to detect
pgrminclude-induced bugs (split c file variant may follow if there is
interest). I tested this on object files compiled before and after the
bug fix to catalog.h in this commit:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f81fb4f690355bc88fee69624103956fb4576fe5

or rather, I tested things after that commit with and without the
supposedly unneeded header; I removed catalog version differences,
applying Ockham's razor, so there was exactly one difference. That
didn't change anything; I just saw the same thing as before, which
was:

[peter(at)localhost commands]$ nm-diff.sh tablespace.o tablespace.old.o
Symbol table differences:
50c50
< 00000000000005e7 r __func__.15690
---
> 00000000000005ef r __func__.15690
WARNING: Symbol tables don't match!

I decided to add a bunch of obviously superfluous headers (a bunch of
xlog stuff) to the same file to verify that they *didn't* change
anything, figuring that it was very unlikely that adding these headers
could *really* change something. However, they did, but I don't think
that that proves that the script is fundamentally flawed (they didn't
*really* change anything):

[peter(at)localhost commands]$ nm-diff.sh tablespace.o tablespace.old.o
Symbol table differences:
41,50c41,50
< 00000000000006f0 r __func__.15719 <-- symbol name differs, offset does not
< 00000000000006d0 r __func__.15730
< 00000000000006be r __func__.15750
< 00000000000006a0 r __func__.15759
< 0000000000000680 r __func__.15771
< 0000000000000660 r __func__.15793
< 0000000000000640 r __func__.15803
< 0000000000000620 r __func__.15825
< 0000000000000600 r __func__.15886
< 00000000000005e7 r __func__.15903 <-- value/local offset the same as before
---
> 00000000000006f0 r __func__.15506
> 00000000000006d0 r __func__.15517
> 00000000000006be r __func__.15537
> 00000000000006a0 r __func__.15546
> 0000000000000680 r __func__.15558
> 0000000000000660 r __func__.15580
> 0000000000000640 r __func__.15590
> 0000000000000620 r __func__.15612
> 0000000000000600 r __func__.15673
> 00000000000005ef r __func__.15690 <-- Also, value/local offset the same as before (same inconsistency too)
WARNING: Symbol tables don't match!

My analysis is that the fact that the arbitrary symbol names assigned
within this read-only data section differ likely has no significance
(it looks like the compiler assigns them at an early optimisation
stage like Postgres assigns sequence values, before some are
eliminated at a later stage - I read ELF docs, but couldn't confirm
this, but maybe should have read GCC docs), so the next revision of
this script should simply ignore them using a regex if they look like
this; only the internal offsets/values matter, and they have been
observed to differ based on whether or not the header is included per
Bruce's revision (importantly, the difference in offset/values of the
last __func__ remains just the same regardless of whether the
superfluous xlog headers are included).

Does that seem reasonable? Is there interest in developing this idea further?

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

Attachment Content-Type Size
nm-diff.sh application/x-sh 1.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-09-08 20:56:09 Re: memory-related bugs
Previous Message Daniel Farina 2011-09-08 20:12:25 Re: memory-related bugs