Re: Fast AT ADD COLUMN with DEFAULTs

From: Serge Rielau <serge(at)rielau(dot)com>
To: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Date: 2016-10-05 21:56:31
Message-ID: 2e441c6a-0835-403f-a342-8c8df9ea5955@rielau.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Oct 5, 2016 at 2:45 PM, Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com> wrote:
On 10/5/16, Serge Rielau <serge(at)rielau(dot)com> wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
...
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed. Yes, Yes, and still analyzing speed
But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk; By solution I think you mean a semantic change from what it is doing today which is: * “Now” is fixed to ALTER TABLE time for all pre-existing rows * serial will fill in the same value for all pre-existing rows Having different semantics for those would require a rewrite and probably different syntax in some form.
This is what my patch does on our PG derivative today: CREATE TABLE t (pk INT NOT NULL PRIMARY KEY); CREATE TABLE postgres=# INSERT INTO t VALUES (1), (2), (3); INSERT 0 3 postgres=# ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now'; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 ----+------------------------------- 1 | 2016-10-05 21:47:58.919194+00 2 | 2016-10-05 21:47:58.919194+00 3 | 2016-10-05 21:47:58.919194+00 (3 rows)
postgres=# postgres=# ALTER TABLE t ADD COLUMN c2 serial; SELECT * FROM t ORDER BY pk; INSERT INTO t(pk) VALUES (4); SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial; ALTER TABLE postgres=# SELECT * FROM t ORDER BY pk; pk | c1 | c2 ----+-------------------------------+---- 1 | 2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 2016-10-05 21:47:58.919194+00 | 1 (3 rows)
postgres=# INSERT INTO t(pk) VALUES (4); INSERT 0 1 postgres=# SELECT * FROM t ORDER BY pk; pk | c1 | c2 ----+-------------------------------+---- 1 | 2016-10-05 21:47:58.919194+00 | 1 2 | 2016-10-05 21:47:58.919194+00 | 1 3 | 2016-10-05 21:47:58.919194+00 | 1 4 | 2016-10-05 21:47:58.919194+00 | 2 (4 rows) P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases... Thanks. This would be my first contribution. I take it I would post a patch based on a recent PG 9.6 master for review? Or should I compose some sort of a design document?
Cheers Serge Rielau Salesforce.com

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2016-10-05 22:15:40 Re: Fast AT ADD COLUMN with DEFAULTs
Previous Message Tom Lane 2016-10-05 21:50:30 Switch to unnamed POSIX semaphores as our preferred sema code?