Re: Rethinking plpgsql's assignment implementation

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
Cc: Chapman Flack <chap(at)anastigmatix(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Rethinking plpgsql's assignment implementation
Date: 2020-12-15 20:17:30
Message-ID: 1311116.1608063450@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I realized that the speedup patch I posted yesterday is flawed: it's
too aggressive about applying the R/W param mechanism, instead of
not aggressive enough.

To review, the point of that logic is that if we have an assignment
like
arrayvar := array_append(arrayvar, some-scalar-expression);
a naive implementation would have array_append construct an entire
new array, which we'd then have to copy into plpgsql's variable
storage. Instead, if the array variable is in expanded-array
format (which plpgsql encourages it to be) then we can pass the
array parameter as a "read/write expanded datum", which array_append
recognizes as license to scribble right on its input and return the
modified input; that takes only O(1) time not O(N). Then plpgsql's
assignment code notices that the expression result datum is the same
pointer already stored in the variable, so it does nothing.

With the patch at hand, a subscripted assignment a[i] := x becomes,
essentially,
a := subscriptingref(a, i, x);
and we need to make the same sort of transformation to allow
array_set_element to scribble right on the original value of "a"
instead of making a copy.

However, we can't simply not consider the source expression "x",
as I proposed yesterday. For example, if we have
a := subscriptingref(a, i, f(array_append(a, x)));
it's not okay for array_append() to scribble on "a". The R/W
param mechanism normally disallows any additional references to
the target variable, which would prevent this error, but I broke
that safety check with the 0007 patch.

After thinking about this awhile, I decided that plpgsql's R/W param
mechanism is really misdesigned. Instead of requiring the assignment
source expression to be such that *all* its references to the target
variable could be passed as R/W, we really want to identify *one*
reference to the target variable to be passed as R/W, allowing any other
ones to be passed read/only as they would be by default. As long as the
R/W reference is a direct argument to the top-level (hence last to be
executed) function in the expression, there is no harm in R/O references
being passed to other lower parts of the expression. Nor is there any
use-case for more than one argument of the top-level function being R/W.

So the attached rewrite of the 0007 patch reimplements that logic to
identify one single Param that references the target variable, and
make only that Param pass a read/write reference, not any other
Params referencing the target variable. This is a good change even
without considering the assignment-reimplementation proposal, because
even before this patchset we could have cases like
arrayvar := array_append(arrayvar, arrayvar[i]);
The existing code would be afraid to optimize this, but it's in fact
safe.

I also re-attach the 0001-0006 patches, which have not changed, just
to keep the cfbot happy.

regards, tom lane

Attachment Content-Type Size
v1-0001-add-raw-parse-modes.patch text/x-diff 8.8 KB
v1-0002-add-spi-prepare-extended.patch text/x-diff 12.2 KB
v1-0003-fix-plpgsql-expressions.patch text/x-diff 32.5 KB
v1-0004-rewrite-plpgsql-assignment.patch text/x-diff 31.9 KB
v1-0005-add-docs-and-tests.patch text/x-diff 21.3 KB
v1-0006-remove-arrayelem-datums.patch text/x-diff 15.8 KB
v2-0007-rethink-rw-param-mechanism.patch text/x-diff 13.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2020-12-15 20:54:51 Re: Minor documentation error regarding streaming replication protocol
Previous Message Pavel Stehule 2020-12-15 19:50:33 Re: SQL/JSON: functions