Re: [HACKERS] Behavior of GENERATED columns per SQL2003

From: Zoltan Boszormenyi <zb(at)cybertec(dot)at>
To: pgsql-patches(at)postgresql(dot)org
Subject: Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Date: 2007-05-08 10:46:34
Message-ID: 4640550A.6070606@cybertec.at
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-patches

Forwarded to -patches because of the attachment.

-------- Eredeti üzenet --------
Tárgy: Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Dátum: Tue, 08 May 2007 12:38:32 +0200
Feladó: Zoltan Boszormenyi <zb(at)cybertec(dot)at>
Címzett: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
CC: pgsql-hackers(at)postgreSQL(dot)org
Hivatkozások: <18812(dot)1178572575(at)sss(dot)pgh(dot)pa(dot)us>

Tom Lane írta:
> I've been studying the SQL spec in a bit more detail and I'm suddenly
> thinking that we've got the behavior all wrong in the current
> GENERATED/IDENTITY patch. In particular, it looks to me like we've
> been implementing GENERATED ALWAYS AS (expr) according to the rules
> that the spec in fact lays down only for GENERATED ALWAYS AS IDENTITY.
> You'd think the two constructs would be pretty closely related but
> the behaviors specified by the spec are light-years apart. If you
> look closely, a "generated column" in the meaning of section 4.14.8
> is one that has GENERATED ALWAYS AS (expr), and identity columns are
> *not* in this class.
>

True.

> It looks to me like the behavior the spec intends for a generated column
> is actually that it can be implemented as a "virtual column" occupying
> no space on disk and instead computed on-the-fly when retrieved.
>

I think the keyword here is 'can be'. But having it stored gives a nice
speedup in SELECTs vs. no speedup if you generate it on the fly.

> Identity columns can have their values overridden by the
> user (it's a little harder if GENERATED ALWAYS, but still possible),
> and they don't change during an UPDATE unless specifically forced to.
>

Yes, I implemented it this way.

> In contrast, generated columns cannot be overridden by
> assignment, and are recomputed from their base columns during updates.
>

I see, I incorrectly made OVERRIDING SYSTEM VALUE
to have an effect on generated columns. Now I don't need
to pass around the list of the modified fields from
rewriteTargetList(), I simply have to blindly update all of them
both in INSERT and UPDATE. Fixed. And now I don't have to
invent something to discover what fields were modified by
BEFORE triggers.

> This realization also explains the following, otherwise rather strange,
> facts:
>
> * There is no GENERATED BY DEFAULT AS (expr) in the spec.
>

Yes, and because it already exists and called DEFAULT.

> * GENERATED expressions are specifically disallowed from containing
> subselects, calling functions that access any SQL-data, or being
> nondeterministic; hence their values depend solely on the regular
> columns in the same row.
>

The sanity checks for the DEFAULT expression already
handle subselect in PostgreSQL:

db=# create table t1 (id float generated always as identity, t text,
g text generated always as (case when t is null then '' else t end ||
(select max(id) from t1));
NOTICE: CREATE TABLE will create implicit sequence "t1_id_seq" for
serial column "t1.id"
ERROR: cannot use subquery in default expression

But how do you check a function in general?
Especially when it's not written in plpgsql?
E.g. a C function can use SPI and SELECTs.

Also, SQL:2003 doesn't allow any functions for the DEFAULT clause
besides timestamp functions, e.g. NOW().
But PostgreSQL already allows non-IMMUTABLE functions used in
DEFAULT clauses. Would you want to restrict it?

> * While identity columns are updated (if needed) before execution of
> BEFORE triggers, generated columns are updated after BEFORE triggers;
> hence a BEFORE trigger can override the value in one case and not the
> other. (The current patch gets this wrong, btw.)
>

Where do you see that? Which version were you looking at?
Identity columns are generated in rewriteTargetList(), way before any
triggers.
Generated column are computed in ExecInsert() and ExecUpdate(),
certainly after applying BEFORE triggers in both cases and
before CheckConstraint(). There was one bug in the UPDATE case, though,
as UPDATE loops if it couldn't do its job in one go because of
serialization.
I fixed this.

> * Generated columns are forcibly updated when their base columns change
> as a result of FK constraints (such as ON UPDATE CASCADE).
>

Isn't ExecUpdate() called on the referring table's row during such FK event?
It seems yes to me:

db=# create table t1 (id serial generated always as identity primary key,
t text generated always as (id || '_1'));
NOTICE: CREATE TABLE will create implicit sequence "t1_id_seq" for
serial column "t1.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t1_pkey"
for table "t1"
CREATE TABLE
db=# create table t2 (id serial generated always as identity primary key,
id_t1 integer not null references t1(id) on update cascade,
g text generated always as (id_t1 || '_1'));
NOTICE: CREATE TABLE will create implicit sequence "t2_id_seq" for
serial column "t2.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "t2_pkey"
for table "t2"
CREATE TABLE
db=# insert into t1 values (default, 'a1');
INSERT 0 1
db=# insert into t1 values (default, 'b2');
INSERT 0 1
db=# insert into t1 values (default, 'c3');
INSERT 0 1
db=# insert into t1 values (default, 'd4');
INSERT 0 1
db=# select * from t1;
id | t
----+-----
1 | 1_1
2 | 2_1
3 | 3_1
4 | 4_1
(4 rows)

db=# insert into t2 values (default, 2);
INSERT 0 1
db=# select * from t2;
id | id_t1 | g
----+-------+-----
1 | 2 | 2_1
(1 row)

db=# update t1 set id = 77 where id = 2;
UPDATE 1
db=# select * from t1;
id | t
----+------
1 | 1_1
3 | 3_1
4 | 4_1
77 | 77_1 (<--- This was buggy, the generated value didn't change here,
as an oversight during the rewrite to
make the generation behaviour
a property of the DEFAULT clause. Now
fixed.)
(4 rows)

zozo=# select * from t2;
id | id_t1 | g
----+-------+------
1 | 77 | 77_1 (<---- It worked this way before, too.)
(1 row)

Anyway, a new patch is necessary it seems,
so it should apply cleanly to new CVS and have the
above bugs fixed. Attached.

> It looks to me like a BEFORE trigger is actually the only place that can
> (transiently) see values of a generated column that are different from
> the result of applying the generation expression on the rest of the row.
> It's unclear whether that's intentional or an oversight.
>

I thought it was intentional. BEFORE triggers can change the base columns
before storing them, but the generated columns should be consistent
with their generation expressions and their base columns after executing
the BEFORE triggers. If the triggers should see generated values
then another recomputation is necessary to refresh the generated
columns AND to make them unchangable by the triggers.
It seems to be a waste of cycles to me. It should be documented, though.

> Is anyone familiar with a database that implements SQL-spec generated
> columns? Do they actually store the columns?
>

David Fuhry answered this question with the advertised intention.

> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faq
>
>

Best regards,
Zoltán Böszörményi

--
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/

--
----------------------------------
Zoltán Böszörményi
Cybertec Geschwinde & Schönig GmbH
http://www.postgresql.at/

Attachment Content-Type Size
psql-serial-45.diff.gz application/x-tar 28.7 KB

Responses

Browse pgsql-patches by date

  From Date Subject
Next Message Tom Lane 2007-05-08 15:02:41 Re: [HACKERS] Behavior of GENERATED columns per SQL2003
Previous Message Neil Conway 2007-05-07 22:25:19 Re: Implemented current_query