make check hang on AIX 5L p690 4way/I have two solutions

From: "Tomoyuki Niijima" <NIIJIMA(at)jp(dot)ibm(dot)com>
To: pgsql-patches(at)postgresql(dot)org
Subject: make check hang on AIX 5L p690 4way/I have two solutions
Date: 2002-08-29 16:43:58
Message-ID: OF397DD310.F74CCBE2-ON49256C24.0056F9C3@LocalDomain
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Your name : Tomoyuki Niijima
Your email address : niijima(at)jp(dot)ibm(dot)com

System Configuration
---------------------
Architecture (example: Intel Pentium) : IBM 7040-681 (pSeries
690) 4way (LPAR)

Operating System (example: Linux 2.0.26 ELF) : AIX 5L 5.1

PostgreSQL version (example: PostgreSQL-7.2.1): PostgreSQL-7.2.1

Compiler used (example: gcc 2.95.2) : gcc 2.9

Please enter a FULL description of your problem:
------------------------------------------------
I tried to build PostgreSQL with the following step to see backends hung
during the regression test. The problem has been reproduced on two machine
but both of these are the same type of hardware and software. I also tried
to recreate the problem on other machines, on older version of AIX but I
couldn't.

Please describe a way to repeat the problem. Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------
./configure --enable-multibyte=EUC_JP --with-CC=gcc
make

I learned that backend slept in semop() by attaching dbx (AIX debugger) to
one of 'postgres:' processes.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------
After looked through pgsql-hackers mailing list, I focused on spin lock
issue to solve the problem. The easiest and may not be the best solution
for the problem is to give up HAS_TEST_AND_SET. This actually works.

*** src/include/port/aix.h.org Tue Feb 13 23:32:52 2001
--- src/include/port/aix.h Fri Aug 30 01:02:28 2002
***************
*** 1,8 ****
#define CLASS_CONFLICT
#define DISABLE_XOPEN_NLS
! #define HAS_TEST_AND_SET
#define NO_MKTIME_BEFORE_1970
! typedef unsigned int slock_t;

#include <sys/machine.h> /* ENDIAN definitions for network
*
communication
*/
--- 1,8 ----
#define CLASS_CONFLICT
#define DISABLE_XOPEN_NLS
! /* #define HAS_TEST_AND_SET */
#define NO_MKTIME_BEFORE_1970
! /* typedef unsigned int slock_t; */

#include <sys/machine.h> /* ENDIAN definitions for network
*
communication
*/

One another and better solution for the problem is to use _check_lock() and
_clear_lock() as spin lock. Important thing here is to define S_UNLOCK()
with _clear_lock(). This will solve the so called "Compiler bug" issue
someone wrote on the mailing list.

We have some other API such as cs(), compare_and_swap() and fetch_and_or()
to do test and set on AIX, but any of these didn't solve my problem. I
wrote tiny testing program to see if we have any bug of these API of AIX,
but I couldn't see any problem except for compare_and_swap(). It seems that
you can not use compare_and_swap() for the purpose, as it would not work as
spin lock on any SMP machines I tested. I don't know the reason why cs()
nor fetch_and_or()/fetch_and_and() will not work with PostgreSQL on p690.
These worked with my testing program on all machines I tested.

*** ./src/include/storage/s_lock.h.org Fri Aug 30 01:13:15 2002
--- ./src/include/storage/s_lock.h Wed Jan 30 00:44:42 2002
***************
*** 440,447 ****
* Note that slock_t on POWER/POWER2/PowerPC is int instead of char
* (see storage/ipc.h).
*/
! #define TAS(lock) _check_lock(lock, 0, 1)
! #define S_UNLOCK(lock) _clear_lock(lock, 0)
#endif /* _AIX */

--- 440,446 ----
* Note that slock_t on POWER/POWER2/PowerPC is int instead of char
* (see storage/ipc.h).
*/
! #define TAS(lock) cs((int *) (lock), 0, 1)
#endif /* _AIX */

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Joe Conway 2002-08-29 17:03:18 Re: Visibility regression test
Previous Message Tom Lane 2002-08-29 16:14:24 Re: [HACKERS] Proposed GUC Variable