Re: BUG #1208: Invalid page header

From: "Robert E(dot) Bruccoleri" <bruc(at)stone(dot)congenomics(dot)com>
To: tgl(at)sss(dot)pgh(dot)pa(dot)us (Tom Lane)
Cc: peter_e(at)gmx(dot)net, pgman(at)candle(dot)pha(dot)pa(dot)us, bruc(at)stone(dot)congenomics(dot)com, pgsql-bugs(at)postgresql(dot)org, hubert(dot)froehlich(at)bvv(dot)bayern(dot)de
Subject: Re: BUG #1208: Invalid page header
Date: 2004-08-16 23:41:09
Message-ID: 200408162341.i7GNfAqi209668@stone.congenomics.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-bugs

Dear All,
Thanks for responding. First of all, the Intel compiler does
not accept inline assembly code, so I substituted the following
compiler intrinsic in its place:

#if defined(__ia64__) || defined(__ia64) /* __ia64 used by ICC compiler? */
/* Intel Itanium */
#define HAS_TEST_AND_SET
#include <ia64intrin.h>

typedef unsigned long slock_t;

#define TAS(lock) _InterlockedExchange64(lock, 1)

#endif /* __ia64__ || __ia64 */

With this code in 7.4.3, I had the problem outlined in my bug
report. I also ran the spin lock test program in s_lock.c, and it
failed as expected with a stuck spinlock. Do you have a locking test
program that is designed to detect errors in a multiprocessor
environment?

Second, I've just gotten the 8.0 beta 1 release, built it
using normal optimization, and it did not encounter any invalid page
headers. Here is a typical compiler command with flags:

icc -w0 -ansi_alias- -O2 -fno-strict-aliasing

It is important to keep in mind that the machine used for this
test can keep all 14 PostgreSQL backends running simultaneously, so any
race conditions would be more likely to occur on it than on a typical
2 or 4 way Linux box.

Question: were there any significant changes made to the
buffer management code between 7.4 and 8.0 that would explain the
difference?

I haven't tried rerunning 7.4.3 without optimization to see if
the problem disappears in that release. Since the 8.0beta1 release
appears OK, and the test run takes about three days, so I'm reluctant
to do it unless there's some value in performing test. Please tell me
if there is.

Another question: on a machine which has this high level of
parallelism, does it make sense to use a spinlock to control access to
the buffer cache instead of a lightweight lock?

Thanks. --Bob

Tom Lane writes:
>
>
> Peter Eisentraut <peter_e(at)gmx(dot)net> writes:
> > Tom Lane wrote:
> >> But that code is gcc-only, and he's not using gcc.
>
> > I think the icc compiler claims to be gcc-compatible in that area, so
> > it's quite likely that the gcc assembler code would be used.
>
> Oh, good point.
>
> In that case it seems entirely possible that the assembly code
> tightening-up patch that I made for 8.0 is relevant. The point of that
> patch was to prevent the compiler from making inappropriate
> optimizations around a spinlock TAS. We have not seen any indication
> that existing gcc releases would actually do anything unwanted ... but
> icc might have different/more aggressive optimizations.
>
> [ looks at code... ] But it looks like 7.4's IA64 code already had the
> memory-clobber constraint, so there doesn't appear to be any significant
> change there since 7.4. I suppose it's worth trying the insignificant
> change though:
>
> __asm__ __volatile__(
> " xchg4 %0=%1,%2 \n"
> : "=r"(ret), "=m"(*lock)
> : "r"(1), "1"(*lock)
> : "memory");
>
> to
>
> __asm__ __volatile__(
> " xchg4 %0=%1,%2 \n"
> : "=r"(ret), "+m"(*lock)
> : "r"(1)
> : "memory");
>

> at line 125ff of src/include/storage/s_lock.h. Note "=m" becomes "+m"
> to replace the separate "1" constraint.
>
> Robert, have you tried backing off compiler optimization levels to see
> if anything changes?
>
> regards, tom lane
>

+-----------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D. | email: bruc(at)acm(dot)org |
| President, Congenair LLC | URL: http://www.congen.com/~bruc |
| P.O. Box 314 | Phone: 609 818 7251 |
| Pennington, NJ 08534 | |
+-----------------------------+------------------------------------+

In response to

Responses

Browse pgsql-bugs by date

  From Date Subject
Next Message Bruce Momjian 2004-08-16 23:42:45 Re: [BUGS] 8.0.0beta1: -lpthread missing
Previous Message Tom Lane 2004-08-16 18:12:24 Re: BUG #1208: Invalid page header