Re: 8.0.0beta5 FailedAssertion (Crash) when casting composite types

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Kris Shannon <kris(dot)shannon(at)gmail(dot)com>
Cc: Postgresql Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: 8.0.0beta5 FailedAssertion (Crash) when casting composite types
Date: 2004-12-05 22:42:10
Message-ID: 19420.1102286530@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers pgsql-patches

kris(dot)shannon(at)gmail(dot)com writes:
> template2=# CREATE TABLE base (i integer);
> template2=# CREATE TABLE derived () INHERITS (base);
> template2=# INSERT INTO derived (i) VALUES (0);
> template2=# SELECT derived::base FROM derived;
> TRAP: FailedAssertion

The cause of this failure is that parse_coerce.c thinks that a child
table's rowtype is binary-compatible with its parent's rowtype:

if (typeInheritsFrom(inputTypeId, targetTypeId))
{
/*
* Input class type is a subclass of target, so nothing to do ---
* except relabel the type. This is binary compatibility for
* complex types.
*/
return (Node *) makeRelabelType((Expr *) node,
targetTypeId, -1,
cformat);
}

As of 8.0 this isn't really true anymore, because a tuple being used as
a rowtype Datum is supposed to carry its rowtype OID in a field of the
tuple header. A proper coercion would require replacing the type OID in
the header.

In a larger sense, however, this code has been wrong since day one.
A child table must have all the same columns its parent does, but it
need not have them in the same column positions. This can happen after
ALTER TABLE ADD COLUMN on the parent, and it can happen even without
ALTER TABLE by use of multiple inheritance --- columns inherited from a
non-first parent are very unlikely to be in the same positions as they
are in that parent. If the columns aren't in the same positions then
the rowtypes certainly aren't binary-compatible. So at least some
flavors of the bug go back a long time, probably to Berkeley days.
For instance, this dumps core in all versions back to at least 7.1
(though you need to adjust the syntax before 7.4):

regression=# create table p1(ff1 int);
CREATE TABLE
regression=# create table p2(f1 text);
CREATE TABLE
regression=# create function p2text(p2) returns text as 'select $1.f1' language sql;
CREATE FUNCTION
regression=# create table c1(f3 int) inherits(p1,p2);
CREATE TABLE
regression=# insert into c1 values(123456789,'hi', 42);
INSERT 589036 1
regression=# select p2text(c1.*) from c1;
server closed the connection unexpectedly

(BTW, "select p2text(p2.*) from p2" works, at least in the last couple
versions, because the planner understands that it must convert child
rowtypes to match the parent. But that path isn't taken in Kris' example.)

The Really Clean And Correct fix to this, IMHO, would be to invent a new
expression node type that represents coercing a rowtype expression to a
different rowtype. Execution of this node would disassemble and
reassemble the tuple Datum, using code not too much different from
execJunk.c, to produce the right column order and the right rowtype OID
label. However that seems much too large a change for post-RC.
(You could also argue that it requires an initdb, though I'd take the
position that it doesn't because there are no working databases that
would be affected.)

A less clean but much more localized fix would be to cause ExecEvalExpr
to do that work when it finds a RelabelType node whose output type is a
composite type. (We could arrange to lookup the pg_type entry only once
per query, during ExecInitExpr, so the performance hit on normal uses of
RelabelType wouldn't be too bad.) A rough estimate is that this would
require about 100 lines of new code in execQual.c, much of which could
be adapted from other places.

Another possibility is to #ifdef out the code in parse_coerce.c that
thinks it can coerce one rowtype to another this way, causing attempts
to do rowtype coercions to fail with a "can't convert type" kind of error.
I've tried this and it does not seem to break any of the regression
tests, but I'm afraid that it will break cases that people are depending
on in practice. Still, given that we are in RC, maybe this is the only
acceptable answer for 8.0. If I were called on to close the hole in 7.4
and before, I'd say this is the only reasonable quick-fix for those
branches, for sure.

Choice #4 is to do nothing, figuring that a bug that has lasted this
long can wait another release cycle to be fixed. But in today's
security atmosphere I doubt that will be considered acceptable.

I definitely intend to do the "clean" fix in 8.1 after we branch,
but I am unsure what to do in 8.0, or in the back branches. Comments?

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2004-12-05 23:02:00 Re: rules, triggers and views
Previous Message Andreas Pflug 2004-12-05 22:32:34 Re: Error: column "nsptablespace" does not exist

Browse pgsql-patches by date

  From Date Subject
Next Message Marko Kreen 2004-12-05 23:27:27 Re: patch contrib/pgcrypto for win32 (2)
Previous Message Tom Lane 2004-12-05 20:22:07 Re: [BUGS] utf-8 flag always off in plperl function arguments