Alignment padding bytes in arrays vs the planner

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Alignment padding bytes in arrays vs the planner
Date: 2011-04-26 23:23:12
Message-ID: 2201.1303860192@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I looked into David Johnston's report of curious planner behavior:
http://archives.postgresql.org/pgsql-general/2011-04/msg00885.php

What is happening is that the planner doesn't reliably see the
expression ti_status = ANY ('{ACTIVE,DISPATCHED,FAILURE}'::text[])
as equal to other copies of itself, so it gets a bit schizoid as
to whether to suppress duplicates. (There might be something else
funny too, but that's most of it.) The reason it doesn't see this
is that the text[] constant is formed from evaluation of an ArrayExpr,
and construct_md_array is cavalier about pad bytes in its output,
and equalfuncs.c compares Const datums by simple memcmp() so equal()
doesn't say TRUE for two array constants with different padding bytes.

We've run into other manifestations of this issue before. Awhile ago
I made a push to ensure that datatype input functions didn't leave any
ill-defined padding bytes in their results, as a result of similar
misbehavior for simple constants. But this example shows that we'd
really have to enforce the rule of "no ill-defined bytes" for just about
every user-callable function's results, which is a pretty ugly prospect.

The seemingly-obvious alternative is to teach equal() to use
type-specific comparison functions that will successfully ignore
semantically-insignificant bytes in a value's representation. However
this answer has got its own serious problems, including performance,
transaction safety (I don't think we can assume that equal() is always
called within live transactions) and the difficulty of identifying
suitable comparison functions. Not all types have btree comparison
functions, and for some of the ones that do, btree "equality" does not
imply that the values are indistinguishable for every purpose, which is
what we really need from equal().

We can solve David's immediate case by ensuring that the functions in
arrayfuncs.c don't emit arrays containing ill-defined bytes, but I have
little confidence that there aren't similar bugs hiding under other
rocks. The only bright spot in the mess is that datatypes containing
pad bytes aren't that common, since usually people feel a need to make
the on-disk representation as small as possible.

Any ideas about better answers?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2011-04-26 23:39:49 Re: branching for 9.2devel
Previous Message Josh Berkus 2011-04-26 23:19:33 Re: branching for 9.2devel