[PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)

From: Brendan Jurd <direvus(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: "David E(dot) Wheeler" <david(at)justatheory(dot)com>, "pgsql-hackers(at)postgresql(dot)org Hackers" <pgsql-hackers(at)postgresql(dot)org>, Pavel Stěhule <pavel(dot)stehule(at)gmail(dot)com>
Subject: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Date: 2013-03-20 23:45:54
Message-ID: CADxJZo0keVhSRzUnot2Y6g46tsP7f-eV28iEmBd3AtLjU-YTMA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 17 March 2013 05:19, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> Brendan Jurd <direvus(at)gmail(dot)com> writes:
>> On 16 March 2013 09:07, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>>> The thing is that that syntax creates an array of zero dimensions,
>>> not one that has 1 dimension and zero elements.
>
>> I'm going to ask the question that immediately comes to mind: Is there
>> anything good at all about being able to define a zero-dimensional
>> array?
>
> Perhaps not. I think for most uses, a 1-D zero-length array would be
> just as good. I guess what I'd want to know is whether we also need
> to support higher-dimensional zero-size arrays, and if so, what does
> the I/O syntax for those look like?
>
> Another fly in the ointment is that if we do redefine '{}' as meaning
> something other than a zero-D array, how will we handle existing
> database entries that are zero-D arrays?
>

Hello hackers,

I submit a patch to rectify the weird and confusing quirk of Postgres
to use "zero dimensions" to signify an empty array.

Rationale:
The concept of an array without dimensions is a) incoherent, and b)
leads to astonishing results from our suite of user-facing array
functions. array_length, array_dims, array_ndims, array_lower and
array_upper all return NULL when presented such an array.

When a user writes ARRAY[]::int[], they expect to get an empty array,
but instead we've been giving them a bizarre semi-functional
proto-array. Not cool.

Approach:
The patch teaches postgres to regard zero as an invalid number of
dimensions (which it very much is), and allows instead for fully armed
and operational empty arrays.

The simplest form of empty array is the 1-D empty array (ARRAY[] or
'{}') but the patch also allows for empty arrays over multiple
dimensions, meaning that the final dimension has a length of zero, but
the other dimensions may have any valid length. For example,
'{{},{},{}}' is a 2-D empty array with dimension lengths {3,0} and
dimension bounds [1:3][1:0].

An empty array dimension may have any valid lower bound, but by
default the lower bound will be 1 and the upper bound will therefore
be 0.

Any zero-D arrays read in from existing database entries will be
output as 1-D empty arrays from array_recv.

There is an existing limitation where you cannot extend a multi-D
array by assigning values to subscripts or slices. I've made no
attempt to address that limitation.

The patch improves some error reporting, particularly by adding
errdetail messages for ereports of problems parsing a curly-brace
array literal.

There is some miscellaneous code cleanup included; I moved the
hardcoded characters '{', '}', etc. into constants, unwound a
superfluous inner loop from ArrayCount, and corrected some mistakes in
code comments and error texts. If preferred for reviewing, I can
split those changes (and/or the new errdetails) out into a separate
patch.

Incompatibility:
This patch introduces an incompatible change in the behaviour of the
aforementioned array functions -- instead of returning NULL for empty
arrays they return meaningful values. Applications relying on the old
behaviour to test for emptiness may be disrupted. One can
future-proof against this by changing e.g.

IF array_length(arr, 1) IS NULL ...

to

IF coalesce(array_length(arr, 1), 0) = 0 ...

There is also a change in the way array subscript assignment works.
Previously it wasn't possible to make a distinction between assigning
to an empty array and assigning to a NULL, so in either case an
expression like "arr[4] := 1" would create "[4:4]={1}". Now there is
a distinction. If you assign to an empty array you will get "{NULL,
NULL, NULL, 1}", whereas if you assign to a NULL you'll get
"[4:4]={1}".

Regression tests:
The regression tests were updated to reflect behavioural changes.

Documentation:
A minor update to the prose in chapter 8.15 is included in the patch.

Issues:
Fixed-length arrays (like oidvector) are zero-based, which means that
they are rendered into string form with their dimension info shown.
So an empty oidvector now renders as "[0:-1]={}", which is correct but
ugly. I'm not delighted with this side-effect but I'm not sure what
can be done about it. I'm open to ideas.

Diffstat:
doc/src/sgml/array.sgml | 4 +-
src/backend/executor/execQual.c | 77 +++++-
src/backend/utils/adt/array_userfuncs.c | 23 +-
src/backend/utils/adt/arrayfuncs.c | 671
+++++++++++++++++++++++++----------------------
src/backend/utils/adt/arrayutils.c | 4 +
src/backend/utils/adt/xml.c | 19 +-
src/include/c.h | 1 +
src/include/utils/array.h | 4 +
src/pl/plpython/plpy_typeio.c | 3 -
src/test/isolation/expected/timeouts.out | 16 +-
src/test/isolation/specs/timeouts.spec | 8 +-
src/test/regress/expected/arrays.out | 55 ++--
src/test/regress/expected/create_function_3.out | 2 +-
src/test/regress/expected/polymorphism.out | 2 +-
src/test/regress/sql/arrays.sql | 6 +-
15 files changed, 519 insertions(+), 376 deletions(-)

I'll add this to the Commit Fest app. It is intended to resolve TODO
item "Improve handling of empty arrays".

Cheers,
BJ

Attachment Content-Type Size
zero-length-array.diff.bz2 application/x-bzip2 13.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2013-03-20 23:52:34 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Dimitri Fontaine 2013-03-20 23:03:21 Re: Let's invent a function to report lock-wait-blocking PIDs