Re: identity columns

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com>
Cc: pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: identity columns
Date: 2017-03-20 09:43:24
Message-ID: CAKOSWNmKbthEdV0m1J=V7r6nDyPk8diKwHW93jYbB+wnXd-c1A@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2/28/17, Peter Eisentraut <peter(dot)eisentraut(at)2ndquadrant(dot)com> wrote:
> New patch that fixes everything. ;-)

Great work!

> On 1/4/17 19:34, Vitaly Burovoy wrote:
>> 1. The fact COPY ignores GENERATED ALWAYS constraint (treats as
>> GENERATED BY DEFAULT) should be mentioned as well as rules.
>
> fixed (documentation added)
>
> What do you mean for rules?

I meant the phrase "However, it will not invoke rules." mentioned there.
For rewrite rules this behavior is mentioned on the COPY page, not on
the "CREATE RULE" page.
I think it would be better if that behavior is placed on both CREATE
TABLE and COPY pages.

>> 2. Usually error message for identical columns (with LIKE clause)
>> looks like this:
>> test=# CREATE TABLE def(i serial, n1 int NOT NULL, n2 int);
>> CREATE TABLE
>> test=# CREATE TABLE test(i serial, LIKE def INCLUDING ALL);
>> ERROR: column "i" specified more than once
>>
>> but for generated columns it is disorienting enough:
>> test=# CREATE TABLE idnt(i int GENERATED BY DEFAULT AS IDENTITY);
>> CREATE TABLE
>> test=# CREATE TABLE test(i int GENERATED BY DEFAULT AS IDENTITY, LIKE
>> idnt INCLUDING ALL);
>> ERROR: relation "test_i_seq" already exists
>
> Yeah, this is kind of hard to fix though, because the sequence names
> are decided during parse analysis, when we don't see that the same
> sequence name is also used by another new column. We could maybe
> patch around it in an ugly way, but it doesn't seem worth it for this
> marginal case.

OK

>> 3. Strange error (not about absence of a column; but see pp.5 and 8):
>> test=# ALTER TABLE idnt ALTER COLUMN o ADD GENERATED ALWAYS AS IDENTITY;
>> ERROR: identity column type must be smallint, integer, or bigint
>
> What's wrong with that?

The column mentioned there does not exist. Therefore the error should
be about it, not about a type of absent column:
test=# CREATE TABLE idnt(i int GENERATED BY DEFAULT AS IDENTITY, t TEXT);
CREATE TABLE
test=# ALTER TABLE idnt ALTER COLUMN o TYPE int; -- expected error message
ERROR: 42703: column "o" of relation "idnt" does not exist
test=# -- compare with:
test=# ALTER TABLE idnt ALTER COLUMN o ADD GENERATED ALWAYS AS
IDENTITY; -- strange error message
ERROR: identity column type must be smallint, integer, or bigint

>> 5. Attempt to make a column be identity fails:
>> test=# ALTER TABLE idnt ALTER COLUMN n1 SET GENERATED BY DEFAULT;
>> ERROR: column "n1" of relation "idnt" is not an identity column
>>
>> As I understand from the Spec, chapter 11.20 General Rule 2)b) says
>> about the final state of a column without mentioning of the initial
>> state.
>> Therefore even if the column has the initial state "not generated",
>> after the command it changes the state to either "GENERATED ALWAYS" or
>> "GENERATED BY DEFAULT".
>
> 11.12 <alter column definition> states that "If <alter identity column
> specification> or <drop identity property clause> is specified, then C
> shall be an identity column." So this clause is only to alter an
> existing identity column, not make a new one.

Ouch. Right. The rule was at the upper level.

Nevertheless even now we don't follow rules directly.
For example, we allow "SET NOT NULL" and "TYPE" for the column which
are restricted by the spec.
Since we extend the spec by "ADD GENERATED...", "SEQUENCE NAME" and
allow more than one identity column, why can't we extend it by
allowing "SET GENERATED" for non-identity columns and drop "ADD
GENERATED..."?

>> 6. The docs mention a syntax:
>> ALTER [ COLUMN ] <replaceable
>> class="PARAMETER">column_name</replaceable> { SET GENERATED { ALWAYS |
>> BY DEFAULT } | SET <replaceable>sequence_option</replaceable> | RESET
>> } [...]
>>
>> but the command fails:
>> test=# ALTER TABLE idnt ALTER COLUMN i RESET;
>> ERROR: syntax error at or near ";"
>> LINE 1: ALTER TABLE idnt ALTER COLUMN i RESET;
>
> This works for me. Check again please.

I'm sorry, it still does not work for me (whether you mix it up with "RESTART"):
test=# ALTER TABLE idnt ALTER COLUMN i RESET;
ERROR: syntax error at or near ";"
LINE 1: ALTER TABLE idnt ALTER COLUMN i RESET;

>> 7. (the same line of the doc)
>> usually ellipsis appears in inside parenthesis with clauses can be
>> repeated, in other words it should be written as:
>> ALTER <skipped> column_name</replaceable> ( { SET GENERATED { ALWAYS |
>> BY DEFAULT } | <skipped> } [...] )
>
> But there are no parentheses in that syntax. I think the syntax
> synopsis as written is correct.

I was wrong. Your version is correct.

>> 9. The command fails:
>> test=# ALTER TABLE idnt ALTER n2 ADD GENERATED ALWAYS AS IDENTITY;
>> ERROR: column "n2" of relation "idnt" must be declared NOT NULL
>> before identity can be added
>>
>> whereas the Spec does not contains a requirement for a column to be a
>> NOT NULLABLE.
>> You can implicitly set a column as NOT NULL (as the "serial" macros
>> does), but not require it later.
>
> The spec requires that an identity column is NOT NULL.

OK. "11.4 SR 16)d" really says "The <column constraint definition> NOT
NULL NOT DEFERRABLE is implicit."

>> 11. The last CF added partitioned tables which are similar to
>> inherited, but slightly different.
>
> fixed

Something is left to be fixed:
test=# CREATE TABLE measurement (
test(# i int GENERATED BY DEFAULT AS IDENTITY,
test(# logdate date not null,
test(# peaktemp int,
test(# unitsales int
test(# ) PARTITION BY RANGE (logdate);
CREATE TABLE
test=# CREATE TABLE measurement_y2016m07
test-# PARTITION OF measurement (
test(# unitsales DEFAULT 0
test(# ) FOR VALUES FROM ('2016-07-01') TO ('2016-08-01');
CREATE TABLE
test=# ALTER TABLE MEASUREMENT ALTER i RESTART 2;
ERROR: column "i" of relation "measurement_y2016m07" is not an identity column

>> 13. Also strange error message:
>> ...
>> test=# DROP TABLE sch.idnt;
>> ERROR: could not find tuple for attrdef 0
>
> I can't reproduce this. Can you give me a complete command sequence
> from scratch?

Since you don't use defaults (which are linked by the "attrdef") it is
not relevant now.

> --
> Peter Eisentraut http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>

The patch should be rebased to the current master. It has easy
conflicts in describe.c in the commit
395bfaae8e786eb17fc9dd79e4636f97c9d9b820.
Please, don't include the file catversion.h in it because it is
changed often and leads to error when applying.

New changes fix old bugs and introduce new ones.

16. changing a type does not change an underlying sequence type (its limits):
test=# CREATE TABLE itest3 (a smallint generated by default as
identity (start with 32767 increment by 5), b text);
CREATE TABLE
test=# ALTER TABLE itest3 ALTER a TYPE bigint;
ALTER TABLE
test=# INSERT INTO itest3(b) VALUES ('a');
INSERT 0 1
test=# INSERT INTO itest3(b) VALUES ('b');
ERROR: nextval: reached maximum value of sequence "itest3_a_seq" (32767)

On the one hand according to the spec it is not possible to change a
type of an identity column, on the other hand the spec says nothing
about different numeric types (int2, int4, int8). The worst thing is
that it is hard to understand which sequence is used (without a
"default") and since the ALTER TABLE is finished without errors users
may think everything is OK, but really it is not.

17. how to restart a sequence?
test=# ALTER TABLE itest3 ALTER a SET RESTART 2;
ERROR: sequence option "restart" not supported here
LINE 1: ALTER TABLE itest3 ALTER a SET RESTART 2;

Khm. The "RESTART" is one of official "sequence_options" (comparable
to the "SEQUENCE NAME"), why it is not allowed?
But there is another DDL which works OK, but not reflected in the docs:
test=# ALTER TABLE itest3 ALTER a RESTART 2;
ALTER TABLE

18. If there is no sequence owned by a column, all DDLs for a sequence
behind an identity column do not raise errors but do nothing:
test=# CREATE TABLE itest3 (a int generated by default as identity
(start with 7), b text);
CREATE TABLE
test=# ALTER SEQUENCE itest3_a_seq OWNED BY NONE;
ALTER SEQUENCE
test=# ALTER TABLE itest3 ALTER a SET START 10;
ALTER TABLE
test=# -- sequence has not changed
test=# \d itest3_a_seq
Sequence "public.itest3_a_seq"
Column | Type | Value
------------+---------+-------
last_value | bigint | 7
log_cnt | bigint | 0
is_called | boolean | f
test=# -- the table is still "generated by default"
test=# \d itest3
Table "public.itest3"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
a | integer | | not null | generated by default as identity
b | text | | |
test=# INSERT INTO itest3(b)VALUES('a'); -- errors appear only when it
is changing
ERROR: no owned sequence found

and the table will be dumped without errors with columns as non-identity.

19. If anyone occasionally drops OWNED BY for the linked sequence
there is no ways to restore it "as was":
test=# CREATE TABLE itest3 (a int generated by default as identity, b text);
CREATE TABLE
test=# ALTER SEQUENCE itest3_a_seq OWNED BY NONE; -- erroneous
ALTER SEQUENCE
test=# INSERT INTO itest3(b)VALUES('a'); -- Ouch! Error!!!
ERROR: no owned sequence found
test=# ALTER SEQUENCE itest3_a_seq OWNED BY itest3.a; -- try to restore
ALTER SEQUENCE
test=# INSERT INTO itest3(b)VALUES('a');
INSERT 0 1

It seems it is OK, all works perfect. But after dump/restore the
column looses the "identity" property:
test2=# \d itest3
Table "public.itest3"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+---------
a | integer | | not null |
b | text | | |

It is a hidden bomb, because process of restoring seems normal (which
means dumps are OK), but a users' code will not work correctly.

20. sequence does not follow the table owned by:
test=# CREATE TABLE itest3 (a int generated by default as identity, b text);
CREATE TABLE
test=# CREATE SCHEMA sch;
CREATE SCHEMA
test=# ALTER TABLE itest3 SET SCHEMA sch;
ALTER TABLE
test=# \d
List of relations
Schema | Name | Type | Owner
--------+--------------+----------+----------
public | itest3_a_seq | sequence | postgres
(
(1 row)
test=# \d sch.*
Table "sch.itest3"
Column | Type | Collation | Nullable | Default
--------+---------+-----------+----------+----------------------------------
a | integer | | not null | generated by default as identity
b | text | | |

Also dump/restore fails with:
ERROR: relation "itest3" does not exist

21. There are many places where error codes look strange:
test=# ALTER TABLE itest3 ALTER a SET SEQUENCE NAME "ww"; -- expected
error code
ERROR: 42601: invalid sequence option SEQUENCE NAME
LINE 1: ALTER TABLE itest3 ALTER a SET SEQUENCE NAME "ww";

42601 = syntax_error

but
test=# ALTER TABLE itest3 ALTER b SET GENERATED BY DEFAULT; --
whether it is "internal_error" or user's error?
ERROR: XX000: column "b" of relation "itest3" is not an identity column

XX000 = internal_error
whether the error 42611 (invalid_column_definition) is good enough for it?

--
Best regards,
Vitaly Burovoy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2017-03-20 10:14:11 Re: Size vs size_t
Previous Message Andres Freund 2017-03-20 09:33:29 Re: Logical decoding on standby