Re: Adding OLD/NEW support to RETURNING

From: Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Adding OLD/NEW support to RETURNING
Date: 2024-03-11 14:03:35
Message-ID: CAEZATCUC2WPFuJCu4yynNEDzD5htPOHoKe0bECBLsCxL1CxV+Q@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, 10 Mar 2024 at 23:41, jian he <jian(dot)universality(at)gmail(dot)com> wrote:
>
> Hi,
> some issues I found, while playing around with
> support-returning-old-new-v2.patch
>

Thanks for testing. This is very useful.

> doc/src/sgml/ref/update.sgml:
>
> There is no parameter explanation for `*`.
> so, I think the synopsis may not cover cases like:
> `
> update foo set f3 = 443 RETURNING new.*;
> `
> I saw the explanation at output_alias, though.

"*" is documented under output_alias and output_expression. I'm not
sure that it makes sense to have a separate top-level parameter
section for it, because "*" is also something that can appear after
table_name, meaning something completely different, so it might get
confusing. Perhaps the explanation under output_expression can be
expanded a bit. I'll think about it some more.

> insert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
> ERROR: function old.f1() does not exist
> LINE 1: ...sert into foo select 1, 2 RETURNING old.*, new.f2, old.f1();
> ^
> HINT: No function matches the given name and argument types. You
> might need to add explicit type casts.

Yes, that's consistent with current behaviour. You can also write
foo.f1() or something_else.f1(). Anything of that form, with
parentheses, is interpreted as schema_name.function_name(), not as a
column reference.

> I found a fancy expression:
> `
> CREATE TABLE foo (f1 serial, f2 text, f3 int default 42);
> insert into foo select 1, 2 union select 11, 22 RETURNING old.*,
> new.f2, (select sum(new.f1) over());
> `
> is this ok?

Yes, I guess it's OK, though not really useful in practice.

"new.f1" is 1 for the first row and 11 for the second. When you write
"(select sum(new.f1) over())", with no FROM clause, you're implicitly
evaluating over a table with 1 row in the subquery, so it just returns
new.f1.

This is the same as the standalone query

SELECT sum(11) OVER();
sum
-----
11
(1 row)

So it's likely that any window function can be used in a FROM-less
subquery inside a RETURNING expression. I can't think of any practical
use for it though. In any case, this isn't something new to this
patch.

> also the following works on PG16, not sure it's a bug.
> `
> insert into foo select 1, 2 union select 11, 22 RETURNING (select count(*));
> `

This is OK, because that subquery is an uncorrelated aggregate query
that doesn't reference the outer query. In this case, it's not very
interesting, because it lacks a FROM clause, so it just returns 1. But
you could also write "(SELECT count(*) FROM some_other_table WHERE
...)", and it would work because the aggregate function is evaluated
over the rows of the table in the subquery. That's more useful if the
subquery is made into a correlated subquery by referring to columns
from the outer query. The rules for that are documented here:

https://www.postgresql.org/docs/current/sql-expressions.html#SYNTAX-AGGREGATES:~:text=When%20an%20aggregate%20expression%20appears%20in%20a%20subquery

> but not these
> `
> insert into foo select 1, 2 union select 11, 22 RETURNING (select
> count(old.*));
> insert into foo select 1, 2 union select 11, 22 RETURNING (select sum(f1));
> `

In these cases, since the aggregate's arguments are all outer-level
variables, it is associated with the outer query, so it is rejected on
the grounds that aggregate functions aren't allowed in RETURNING.

It is allowed if that subquery has a FROM clause, since the aggregated
arguments are then treated as constants over the rows in the subquery,
so arguably the same could be made to happen without a FROM clause,
but there really is no practical use case for allowing that. Again,
this isn't something new to this patch.

> I found another interesting case, while trying to add some tests on
> for new code in createplan.c.
> in postgres_fdw.sql, right after line `MERGE ought to fail cleanly`
>
> --this will work
> insert into itrtest select 1, 'foo' returning new.*,old.*;
> --these two will fail
> insert into remp1 select 1, 'foo' returning new.*;
> insert into remp1 select 1, 'foo' returning old.*;
>
> itrtest is the partitioned non-foreign table.
> remp1 is the partition of itrtest, foreign table.

Hmm, I was a little surprised that that first example worked, but I
can see why now.

I was content to just say that RETURNING old/new wasn't supported for
foreign tables in this first version, but looking at it more closely,
the only tricky part is direct-modify updates. So if we just disable
direct-modify when there are OLD/NEW variables in the the RETURNING
list, then it "just works".

So I've done that, and added a few additional tests to
postgres_fdw.sql, and removed the doc notes about foreign tables not
being supported. I really thought that there would be more to it than
that, but it seems to work fine.

> I did find a segment fault bug.
> insert into foo select 1, 2 RETURNING (select sum(old.f1) over());
>
> This information is set in a subplan node.
> /* update the ExprState's flags if Var refers to OLD/NEW */
> if (variable->varreturningtype == VAR_RETURNING_OLD)
> state->flags |= EEO_FLAG_HAS_OLD;
> else if (variable->varreturningtype == VAR_RETURNING_NEW)
> state->flags |= EEO_FLAG_HAS_NEW;
>
> but in ExecInsert it didn't use subplan node state->flags information

Ah, good catch!

When recursively initialising a SubPlan, if any of its expressions is
found to contain OLD/NEW Vars, it needs to update the flags on the
parent ExprState. Fixed in the new version.

> @@ -411,12 +411,21 @@ slot_getsysattr(TupleTableSlot *slot, in
> {
> Assert(attnum < 0); /* caller error */
>
> + /*
> + * If the tid is not valid, there is no physical row, and all system
> + * attributes are deemed to be NULL, except for the tableoid.
> + */
> if (attnum == TableOidAttributeNumber)
> {
> *isnull = false;
> return ObjectIdGetDatum(slot->tts_tableOid);
> }
> - else if (attnum == SelfItemPointerAttributeNumber)
> + if (!ItemPointerIsValid(&slot->tts_tid))
> + {
> + *isnull = true;
> + return PointerGetDatum(NULL);
> + }
> + if (attnum == SelfItemPointerAttributeNumber)
> {
> *isnull = false;
> return PointerGetDatum(&slot->tts_tid);
>
> These changes is slot_getsysattr is somehow independ of this feature?

This is necessary because under some circumstances, when returning
old/new, the corresponding table slot may contain all NULLs and an
invalid ctid. For example, the old slot in an INSERT which didn't do
an ON CONFLICT UPDATE. So we need to guard against that, in case the
user tries to return old.ctid, for example. It's useful to always
return a non-NULL tableoid though, because that's a property of the
table, rather than the row.

Thanks for testing.

Attached is an updated patch, fixing the seg-fault and now with
support for foreign tables.

Regards,
Dean

Attachment Content-Type Size
support-returning-old-new-v3.patch text/x-patch 136.3 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Nathan Bossart 2024-03-11 14:13:57 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Jacob Champion 2024-03-11 13:47:16 Re: WIP Incremental JSON Parser