NetBSD 1.6 Alpha Postgresql Fix

From: "Thomas T(dot) Thai" <tom(at)minnesota(dot)com>
To: <pgsql-general(at)postgresql(dot)org>
Cc: <teodor(at)sigaev(dot)ru>, <tgl(at)sss(dot)pgh(dot)pa(dot)us>, <oleg(at)sai(dot)msu(dot)su>, <port-alpha(at)netbsd(dot)org>, <mcmahill(at)mtl(dot)mit(dot)edu>, <wiz(at)netbsd(dot)org>
Subject: NetBSD 1.6 Alpha Postgresql Fix
Date: 2003-03-29 08:28:14
Message-ID: 3112.63.226.186.156.1048926494.squirrel@mail.minnesota.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-general

I'm writing this mainly to get it into the archive. Hopefully it'll save
someone else many hours of headache.

NetBSD 1.6 / Alpha (64-bit)
Postgresql 7.3.2 / 7.4-dev

When doing regression tests, 17 tests would fail under 7.3.2 and 1 test
would fail under 7.4-dev. Most of the failures were due to this error:

ERROR - datumGetSize: Invalid typLen 0

That same error can be replicated by:

Reproduce bug:
initdb and start postmaster
%createdb foo
%psql foo < bug.sql
CREATE TABLE
INSERT 16996 1
INSERT 16997 1
CREATE TABLE
INSERT 17003 1
ts_name | ts_name
---------+---------
default | default
default | default
(2 rows)

VACUUM
ERROR: datumGetSize: Invalid typLen 0

bug.sql is very simple:
------------------------------------------------
CREATE TABLE pg_ts_cfgmap (
ts_name text
);

insert into pg_ts_cfgmap values ('default');
insert into pg_ts_cfgmap values ('default');

CREATE TABLE pg_ts_cfg (
ts_name text
);

insert into pg_ts_cfg values ('default');

select
*
from
pg_ts_cfgmap,
pg_ts_cfg
where
pg_ts_cfgmap.ts_name = pg_ts_cfg.ts_name;

vacuum analyze pg_ts_cfgmap;

select
*
from
pg_ts_cfgmap,
pg_ts_cfg
where
pg_ts_cfgmap.ts_name = pg_ts_cfg.ts_name;

-----------

Tom Lane's discovery:

AFAICT, this is nothing more nor less than a compiler bug: an int16
variable in get_var_maximum is being passed to datumCopy, which declares
its argument as type int. Inside get_var_maximum, gdb shows the int16
variable as having value 64, which is correct (the variable in question
is of type NAME, so that is the right length for it). But datumCopy is
receiving a value of zero. It appears there's an error in gcc that makes
it do the int16->int widening incorrectly in this particular case.

We could maybe defend against this particular case by inserting an
explicit cast, but it certainly wouldn't be practical to put casts
into the many other places where function arguments are supposed to be
coerced to the right size. I'd counsel trying to get the compiler
bug fixed. 2.95.3 is kinda old; maybe there is a later release that
fixes the problem...

------------

The bug turned out to be a gcc-2.95.3 64-bit related bug. The solution was
to use gcc-3.2.2 from NetBSD package collection in /usr/pkgsrc/lang/gcc3.

Then in Postgresql-7.4-dev:

./configure <your options> CC=/usr/pkg/gcc-3.2.2/bin/cc

Then compile as usual. All 90 regression tests and the test above passed.

I want to thank the following people:

- "Oleg Bartunov" <oleg(at)sai(dot)msu(dot)su>
- "Teodor Sigaev" <teodor(at)sigaev(dot)ru>
- "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>

--
Thomas T. Thai

Browse pgsql-general by date

  From Date Subject
Next Message Georg Steffers 2003-03-29 10:39:56 Ideas, suggestions to rules (based on a real problem)
Previous Message shreedhar 2003-03-29 07:50:54 Query