Back branches vs. gcc 4.8.0

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Back branches vs. gcc 4.8.0
Date: 2013-04-05 22:14:44
Message-ID: 14242.1365200084@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

After quite a bit of hair-pulling trying to install Fedora 19 Alpha,
I've been able to reproduce the initdb-time failure that's currently
being seen on buildfarm member anchovy, and was also complained of
recently by an end user:
http://www.postgresql.org/message-id/CAOD=oQ-kq3Eg5SOvRYOVxDuqibVWC8R0wEivPsMGcyzZY-nfzA@mail.gmail.com

It is exactly what I suspected, namely that gcc 4.8.0 is applying an
optimization that breaks our code; the odd thing though is that it's
not breaking 9.2 or HEAD, just the older branches.

It turns out that what's happening is that with
-faggressive-loop-optimizations turned on (as it is by default),
gcc decides that loops that iterate over the elements of an int2vector
can iterate at most once, because int2vector is declared with a fixed
size values[] array:

int16 values[1]; /* VARIABLE LENGTH ARRAY */
} int2vector; /* VARIABLE LENGTH STRUCT */

Now, gcc does know better than to make such an assumption
unconditionally, but what I discovered is that it *will* assume this if
the int2vector is declared as a non-last element of a larger struct,
so that (in gcc's little mind anyway) it couldn't possibly really be
a variable-length array.

In other words, the reason 9.2 and up don't break is commit
8137f2c32322c624e0431fac1621e8e9315202f9, which arranged to hide
non-fixed-offset catalog columns from the compiler. Without that,
gcc decides that for instance pg_index.indkey cannot have more than one
member. That breaks the loop in BuildIndexInfo() that copies the key
column numbers into an IndexInfo, leading to the observed failure.

Since gcc 4.8 is going to be on a lot of people's machines pretty soon,
I think we need to do something to prevent it from breaking 8.4.x and
9.0.x. It looks like our choices are (1) teach configure to enable
-fno-aggressive-loop-optimizations if the compiler recognizes it,
or (2) back-port commit 8137f2c32322c624e0431fac1621e8e9315202f9.

I'm a bit leaning towards (1), mainly because I'm not excited about
fighting a compiler arms race in the back branches.

It also strikes me that we ought to take this as a warning sign
that we need to work on getting rid of coding like the above in favor
of genuine "flexible arrays", before the gcc boys think of some other
overly-cute optimization based on the assumption that an array declared
with a fixed size really is fixed.

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rodrigo Barboza 2013-04-05 22:26:49 Unrecognized type error (postgres 9.1.4)
Previous Message Jeff Janes 2013-04-05 22:02:28 Re: BUG #8043: 9.2.4 doesn't open WAL files from archive, only looks in pg_xlog