Re: [HACKERS] generated columns

From: Michael Paquier <michael(dot)paquier(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: Jaime Casanova <jaime(dot)casanova(at)2ndquadrant(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] generated columns
Date: 2018-01-19 05:18:04
Message-ID: 20180119051804.GC12826@paquier.xyz
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Jan 16, 2018 at 09:55:16AM -0500, Peter Eisentraut wrote:
> Here you go. Those changes actually meant that genbki.pl doesn't need
> to be touched by this patch at all, so that's a small win.

Thanks for the updated version. I have spent some time looking at what
you are proposing here.

Instead of leaving bits for a feature that may or may not be
implemented, have you considered just blocking STORED at parsing level
and remove those bits? There is little point in keeping the equivalent
of dead code in the tree. So I would suggest a patch simplification:
- Drop the VIRTUAL/STORED parsing from the grammar for now.
- Define VIRTUAL as the default for the future.
This way, if support for STORED is added in the future the grammar can
just be extended. This is actually implied in pg_dump.c. And +1 for the
catalog format you are proposing.

=# CREATE TABLE gen_1 (a int, b int GENERATED ALWAYS AS (a * 2)
VIRTUAL);
CREATE TABLE
=# insert into gen_1 values (2000000000);
INSERT 0 1
=# select * from gen_1 ;
ERROR: 22003: integer out of range
Overflow checks do not happen when inserting, which makes the following
SELECT to fail. This could be confusing for the user because the row has
been inserted. Perhaps some evaluation of virtual tuples at insert phase
should happen. The existing behavior makes sense as well as virtual
values are only evaluated when read, so a test would be at least
welcome. Does the SQL spec mention the matter? How do other systems
handle such cases? CHECK constraints can be combined, still..

The last patch crashes for typed tables:
=# CREATE TYPE itest_type AS (f1 integer, f2 text, f3 bigint);
CREATE TYPE
=# CREATE TABLE itest12 OF itest_type (f1 WITH OPTIONS GENERATED ALWAYS
AS (f2 *2));
[... boom ...]
And for partitions:
=# CREATE TABLE itest_parent (f1 date NOT NULL, f2 text, f3 bigint)
PARTITION BY RANGE (f1);
CREATE TABLE
=# CREATE TABLE itest_child PARTITION OF itest_parent (f3 WITH OPTIONS
GENERATED ALWAYS AS (f3)) FOR VALUES FROM ('2016-07-01') TO
('2016-08-01');
[... boom ...]
Like what we did in 005ac298, I would suggest to throw
ERRCODE_FEATURE_NOT_SUPPORTED. Please also add some tests.

+SELECT a, c FROM gtest12; -- FIXME: should be allowed
+ERROR: permission denied for function gf1
[...]
+ALTER TABLE gtest27 ALTER COLUMN b DROP DEFAULT; -- FIXME
+ERROR: column "b" of relation "gtest27" is a generated column
Any fixes for those?

+ if (get_attgenerated(relationId, attno))
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("index creation on generated columns is not supported")));
Shouldn't such messages mention explicitely "virtually"-generated
columns? For stored columns the support would not be complicated in this
case.

=# create table ac (a int, b text generated always as (substr(a::text, 0, 3)));
CREATE TABLE
=# alter table ac alter COLUMN a type text;
ERROR: 42601: cannot alter type of a column used by a generated column
DETAIL: Column "a" is used by generated column "b".
In this case ALTER TABLE could have worked. No complain from me to
disable that as a first step though.

+UPDATE gtest1 SET b = DEFAULT WHERE a = 1;
+UPDATE gtest1 SET b = 11 WHERE a = 1; -- error
+ERROR: column "b" can only be updated to DEFAULT
+DETAIL: Column "b" is a generated column.
I see... This is consistent with the behavior of INSERT where DEFAULT
can be used and the INSERT succeeds.

+-- domains
+CREATE DOMAIN gtestdomain1 AS int CHECK (VALUE < 10);
+CREATE TABLE gtest24 (a int PRIMARY KEY, b gtestdomain1 GENERATED
ALWAYS AS (a * 2)); -- prohibited
+ERROR: virtual generated column "b" cannot have a domain type
CHECK constraints can be used, so I find this restriction confusing.

No test coverage for DELETE triggers?

+UPDATE gtest26 SET a = a * -2;
+INFO: gtest1: old = (-2,)
+INFO: gtest1: new = (4,)
+INFO: gtest3: old = (-2,)
+INFO: gtest3: new = (4,)
+INFO: gtest4: old = (3,)
+INFO: gtest4: new = (-6,)
OLD and NEW values for generated columns don't show up. Am I missing
something or they should be available?

@@ -15526,13 +15553,6 @@ dumpTableSchema(Archive *fout, TableInfo *tbinfo)
- if (has_default)
- appendPQExpBuffer(q, " DEFAULT %s",
- tbinfo->attrdefs[j]->adef_expr);
-
- if (has_notnull)
- appendPQExpBufferStr(q, " NOT NULL");
Why does the ordering of actions need to be changed here?

[nit_mode]
+ if (attgenerated)
+ /*
+ * Generated column: Dropping anything that the generation expression
+ * refers to automatically drops the generated column.
+ */
+ recordDependencyOnSingleRelExpr(&colobject, expr, RelationGetRelid(rel),
+ DEPENDENCY_AUTO,
+ DEPENDENCY_AUTO, false);
Please use brackers here if you use a comment in the if() block...
[/nit_mode]

COPY as you are proposing looks sensible to me. I am not sure about any
options though as it is possible to enforce the selection of generated
columns using COPY (SELECT).

Per my tests, generated columns can work with column system attributes
(xmax, xmin, etc.). Some tests could be added.

+ /*
+ * Generated columns don't use the attnotnull field but use a full CHECK
+ * constraint instead. We could implement here that it finds that CHECK
+ * constraint and drops it, which is kind of what the SQL standard would
+ * require anyway, but that would be quite a bit more work.
+ */
+ if (((Form_pg_attribute) GETSTRUCT(tuple))->attgenerated)
+ ereport(ERROR,
+ (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
+ errmsg("cannot use DROP NOT NULL on generated column \"%s\"",
+ colName)));
And the point of storing NOT NULL constraints as CHECK constraints shows
up again... :( It would be nice to mention as well at the top of
ATExecSetNotNull() that a full-fledge switch could help generated
columns as well.

- if (tab->relkind == RELKIND_RELATION ||
- tab->relkind == RELKIND_PARTITIONED_TABLE)
+ if ((tab->relkind == RELKIND_RELATION ||
+ tab->relkind == RELKIND_PARTITIONED_TABLE) &&
+ get_attgenerated(RelationGetRelid(rel), attnum) != ATTRIBUTE_GENERATE
I think that you should store the result of get_attgenerated() and reuse
it multiple times.
--
Michael

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Froehlich 2018-01-19 05:23:26 STATISTICS retained in CREATE TABLE ... LIKE (INCLUDING ALL)?
Previous Message Amit Langote 2018-01-19 04:36:26 Re: [HACKERS] path toward faster partition pruning