Re: Rethinking plpgsql's assignment implementation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Chapman Flack <chap(at)anastigmatix(dot)net>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: Rethinking plpgsql's assignment implementation
Date: 2020-12-13 21:40:32
Message-ID: 780642.1607895632@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I wrote:
> So my idea here is to add a parsing-mode option to raw_parser(),
> which would be an enum with values like "normal SQL statement",
> "expression only", "type name", "plpgsql assignment statement".

Here's a fleshed-out patch series that attacks things that way.
I'm a lot better pleased with this than with my original approach.

0001 creates the basic infrastructure for "raw parse modes", and as
proof of concept simplifies typeStringToTypeName(). There's a minor
functional improvement there, which is that we can now use the core
parser's error cursor position, so instead of

regression=# do $$ declare x int[23/] ; begin end $$;
ERROR: syntax error at or near "/"
LINE 1: do $$ declare x int[23/] ; begin end $$;
^
CONTEXT: invalid type name "int[23/] "

you get

regression=# do $$ declare x int[23/] ; begin end $$;
ERROR: syntax error at or near "/"
LINE 1: do $$ declare x int[23/] ; begin end $$;
^
CONTEXT: invalid type name "int[23/] "

It's possible we could dispense with the error context callback
in typeStringToTypeName altogether, but I've not experimented much.

0002 tackles the next problem, which is to make this feature accessible
through SPI. There are a couple of possibly-controversial choices here.

Following the principle that we should avoid changing documented SPI
interfaces, we need a new version of SPI_prepare to pass RawParseMode
through. This'll be the fourth one :-(, so I decided it was time to
try to make a definition that can stay API-compatible through future
changes. So it takes a struct of options, and I added a promise that
zeroing the struct is enough to guarantee forward compatibility
through future additions.

This leaves both of the previous iterations, SPI_prepare_cursor
and SPI_prepare_params, unused anywhere in the core code.
I suppose we can't kill them (codesearch.debian.net knows of some
external uses) but I propose to mark them deprecated, with an eye
to at least removing their documentation someday.

I did not want to add a RawParseMode parameter to pg_parse_query(),
because that would have affected a larger number of unrelated modules,
and it would not have been great from a header-inclusion footprint
standpoint either. So I chose to pass down the mode from SPI by
having it just call raw_parser() directly instead of going through
pg_parse_query(). Perhaps this is a modularity violation, or perhaps
there's somebody who really wants the extra tracing overhead in
pg_parse_query() to apply to SPI queries. I'm open to discussing
whether this should be done differently.

(However, having made these two patches, I'm now wondering whether
there is any rhyme or reason to the existing state of affairs
with some callers going through pg_parse_query() while others use
raw_parser() directly. It's hard to knock making a different
choice in spi.c unless we have a coherent policy about which to
use where.)

Next, 0003 invents a raw parse mode for plpgsql expressions (which,
in some contexts, can be pretty nearly whole SELECT statements),
and uses that to get plpgsql out of the business of prefixing
"SELECT " to user-written text. I would not have bothered with this
as a standalone fix, but I think it does make for less-confusing
error messages --- we've definitely had novices ask "where'd this
SELECT come from?" in the past. (I cheated a bit on PERFORM, though.
Unlike other places, it needs to allow UNION, so it can't use the
same restricted syntax.)

0004 then reimplements plpgsql assignment. This is essentially the same
patch I submitted before, but redesigned to work with the infrastructure
from 0001-0003.

0005 adds documentation and test cases. It also fixes a couple
of pre-existing problems that the plpgsql parser had with assigning
to sub-fields of record fields, which I discovered while making the
tests.

Finally, 0006 removes plpgsql's ARRAYELEM datum type, on the grounds
that we don't need it anymore. This might be a little controversial
too, because there was still one way to reach the code: GET DIAGNOSTICS
with an array element as target would do so. However, that seems like
a pretty weird corner case. Reviewing the git history, I find that
I added support for that in commit 55caaaeba; but a check of the
associated discussion shows that there was no actual user request for
that, I'd just done it because it was easy and seemed more symmetric.
The amount of code involved here seems way more than is justified by
that one case, so I think we should just take it out and lose the
"feature". (I did think about whether GET DIAGNOSTICS could be
reimplemented on top of the new infrastructure, but it wouldn't be
easy because we don't have a SQL-expression representation of the
GET DIAGNOSTICS values. Moreover, going in that direction would add
an expression evaluation, making GET DIAGNOSTICS slower. So I think
we should just drop it.)

regards, tom lane

Attachment Content-Type Size
v1-0001-add-raw-parse-modes.patch text/x-diff 8.8 KB
v1-0002-add-spi-prepare-extended.patch text/x-diff 12.2 KB
v1-0003-fix-plpgsql-expressions.patch text/x-diff 32.5 KB
v1-0004-rewrite-plpgsql-assignment.patch text/x-diff 31.9 KB
v1-0005-add-docs-and-tests.patch text/x-diff 21.3 KB
v1-0006-remove-arrayelem-datums.patch text/x-diff 15.8 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-12-14 01:26:01 Re: pg_waldump error message fix
Previous Message Noah Misch 2020-12-13 20:06:08 Re: HASH_BLOBS hazards (was Re: PATCH: logical_work_mem and logical streaming of large in-progress transactions)