Supporting NULL elements in arrays

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: pgsql-hackers(at)postgreSQL(dot)org
Subject: Supporting NULL elements in arrays
Date: 2005-11-08 00:09:04
Message-ID: 29807.1131408544@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I'm starting to think about what it'll take to allow arrays to contain
elements that are NULL. The behavioral semantics seem pretty
straightforward, but there are a couple of areas that need discussion.

One trouble spot is what the I/O representation should look like.
Since 8.0, the array input parser has rejected empty elements:

regression=# select '{a,,c}'::text[];
ERROR: malformed array literal: "{a,,c}"
regression=# select '{a, ,c}'::text[];
ERROR: malformed array literal: "{a, ,c}"
-- the right way to specify a zero-length string is:
regression=# select '{a,"",c}'::text[];
text
----------
{a,"",c}
(1 row)

and so the most straightforward thing to do is define an empty element
as meaning a NULL. But this might be objected to on a couple of grounds:

1. "Since 8.0" isn't really old enough --- there may well be applications
still out there that think '{a,,c}'::text[] should produce a zero-length
string element and not a NULL element. (Note: this isn't a hazard for
reloading old dump files, because the array output routine has dumped
empty strings as "" since 7.0 if not before.)

2. Even today, the array documentation at
http://developer.postgresql.org/docs/postgres/arrays.html
doesn't actually *say* that empty elements are disallowed.

I don't see any alternatives that seem better, though, and the
empty-element convention at least has the virtue of being reasonably
compatible with what we did for I/O of composite data types. Anyone
have a better idea?

Another issue is what to do with the internal representation. I think
that basically we want to insert a null bitmap just following the
dimension info, with the option to omit the bitmap if there are no
null values. Now, when the bitmap is present, it would be fairly tedious
to calculate the length of the bitmap to determine the offset to the
actual data; and that's an operation that we'd like to be cheap. What
I'm thinking of doing is commandeering the whole "flags" field of
ArrayType (which is currently unused and should be always zero), and
redefining it thus:
zero: bitmap not present (fully backwards-compatible case)
not zero: bitmap is present; value is offset to start of data
I wouldn't propose doing this if I thought we had any pressing reason
to save some array flag bits for some other purpose; but I can't think
of anything else we might want 'em for. Did anyone have any pet ideas
this would foreclose?

Also, with respect to the binary I/O representation, I'd suggest replacing
the "flag" word with "zero if no bitmap, 1 if bitmap present". This would
require both sender and receiver to calculate the bitmap length from the
given dimensions, but it seems best to require that for error-checking
reasons.

Thoughts anyone?

regards, tom lane

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim C. Nasby 2005-11-08 01:55:21 FSM pages stored v. pages needed
Previous Message Tom Lane 2005-11-07 23:31:40 Re: broken comment justification logic in new pgindent