Re: Cleaning up array_in()

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Alexander Lakhin <exclusion(at)gmail(dot)com>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Nikhil Benesch <nikhil(dot)benesch(at)gmail(dot)com>, Nathan Bossart <nathandbossart(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Cleaning up array_in()
Date: 2023-11-07 23:52:09
Message-ID: 557218.1699401129@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I got back to looking at this today (sorry for delay), and did a pass
of code review. I think we are getting pretty close to something
committable. The one loose end IMO is this bit in ReadArrayToken:

+ case '"':
+
+ /*
+ * XXX "Unexpected %c character" would be more apropos, but
+ * this message is what the pre-v17 implementation produced,
+ * so we'll keep it for now.
+ */
+ errsave(escontext,
+ (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
+ errmsg("malformed array literal: \"%s\"", origStr),
+ errdetail("Unexpected array element.")));
+ return ATOK_ERROR;

This comes out when you write something like '{foo"bar"}', and I'd
say the choice of message is not great. On the other hand, it's
consistent with what you get from '{"foo""bar"}', and if we wanted
to change that too then some tweaking of the state machine in
ReadArrayStr would be required (or else modify ReadArrayToken so
it doesn't return instantly upon seeing the second quote mark).
I'm not sure that this is worth messing with.

Anyway, I think we are well past the point where splitting the patch
into multiple parts is worth doing, because we've rewritten pretty
much all of this code, and the intermediate versions are not terribly
helpful. So I just folded it all into one patch.

Some notes about specific points:

* Per previous discussion, I undid the change to allow "[1:0]"
dimensions, but I left a comment behind about that.

* Removing the ArrayGetNItemsSafe/ArrayCheckBoundsSafe calls
seems OK, but then we need to be more careful about detecting
overflows and disallowed cases in ReadArrayDimensions.

* I don't think the ARRAYDEBUG code is of any value whatever.
The fact that nobody bothered to improve it to print more than
the dim[] values proves it hasn't been used in decades.
Let's just nuke it.

* We can simplify the state machine in ReadArrayStr some more: it
seems to me it's sufficient to track "expect_delim", as long as you
realize that that really means "expect typdelim or right brace".
(Maybe another name would be better? I couldn't think of anything
short though.)

* I switched to using a StringInfo instead of a fixed-size elembuf,
as Heikki speculated about.

* I added some more test cases to cover things that evidently weren't
sufficiently tested, like the has_escapes business which was flat
out broken in v10, and to improve the code coverage report.

Comments?

regards, tom lane

Attachment Content-Type Size
v11-0001-Simplify-and-speed-up-array-in.patch text/x-diff 49.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2023-11-08 00:20:11 Re: [PATCHES] Post-special page storage TDE support
Previous Message Andres Freund 2023-11-07 23:49:32 Re: Moving forward with TDE [PATCH v3]