Re: Cleaning up array_in()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Nikhil Benesch <nikhil(dot)benesch(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, jian he <jian(dot)universality(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cleaning up array_in()
Date: 2023-07-04 18:33:14
Message-ID: 874153.1688495594@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Nikhil Benesch <nikhil(dot)benesch(at)gmail(dot)com> writes:
> I spotted a few opportunities for further reducing state tracked by
> `ArrayCount`. You may not find all of these suggestions to be
> worthwhile.

I found some time today to look at these points.

> 1) `in_quotes` appears to be wholly redundant with `parse_state ==
> ARRAY_QUOTED_ELEM_STARTED`.

I agree that it is redundant, but I'm disinclined to remove it because
the in_quotes logic matches that in ReadArrayStr. I think it's better
to keep those two functions in sync. The parse_state represents an
independent set of checks that need not be repeated by ReadArrayStr,
but both functions have to track quoting. The same for eoArray.

> 2) The `empty_array` special case does not seem to be important to
> ArrayCount's callers, which don't even special case `ndims == 0` but
> rather `ArrayGetNItemsSafe(..) == 0`. Perhaps this is a philosophical
> question as to whether `ArrayCount('{{}, {}}')` should return
> (ndims=2, dims=[2, 0]) or (ndims=0). Obviously someone needs to do
> that normalization, but `ArrayCount` could leave that normalization to
> `ReadArrayStr`.

This idea I do like. While looking at the callers, I also noticed
that it's impossible currently to write an empty array with explicit
specification of bounds. It seems to me that you ought to be able
to write, say,

SELECT '[1:0]={}'::int[];

but up to now you got "upper bound cannot be less than lower bound";
and if you somehow got past that, you'd get "Specified array
dimensions do not match array contents." because of ArrayCount's
premature optimization of "one-dimensional array with length zero"
to "zero-dimensional array". We can fix that by doing what you said
and adjusting the initial bounds restriction to be "upper bound cannot
be less than lower bound minus one".

> I also have a sense that `ndims_frozen` made the distinction between
> `ARRAY_ELEM_DELIMITED` and `ARRAY_LEVEL_DELIMITED` unnecessary, and
> the two states could be merged into a single `ARRAY_DELIMITED` state,
> but I've not pulled on this thread hard enough to say so confidently.

I looked at jian he's implementation of that and was not impressed:
I do not think the logic gets any clearer, and it seems to me that
this makes a substantial dent in ArrayCount's ability to detect syntax
errors. The fact that one of the test case error messages got better
seems pretty accidental to me. We can get the same result in a more
purposeful way by giving a different error message for
ARRAY_ELEM_DELIMITED.

So I end up with the attached. I went ahead and dropped
ArrayGetOffset0() as part of 0001, and I split 0002 into two patches
where the new 0002 avoids re-indenting any existing code in order
to ease review, and then 0003 is just a mechanical application
of pgindent.

I still didn't do anything about "number of array dimensions (7)
exceeds the maximum allowed (6)". There are quite a few instances
of that wording, not only array_in's, and I'm not sure whether to
change the rest. In any case that looks like something that
could be addressed separately. The other error message wording
changes here seem to me to be okay.

regards, tom lane

Attachment Content-Type Size
v2-0001-Simplify-and-speed-up-ReadArrayStr.patch text/x-diff 8.1 KB
v2-0002-Rewrite-ArrayCount-to-make-dimensionality-checks-.patch text/x-diff 20.3 KB
v2-0003-Re-indent-ArrayCount.patch text/x-diff 10.7 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2023-07-04 18:36:43 Re: On /*----- comments
Previous Message Andrew Dunstan 2023-07-04 18:19:46 Re: pg_basebackup check vs Windows file path limits