Re: Fast AT ADD COLUMN with DEFAULTs

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Date: 2016-10-05 21:45:36
Message-ID: CAKOSWNnWBu-Q3Xh38pMmvESB4KN2=uj1gaFav2EjzTFL=i9Krw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/5/16, Serge Rielau <serge(at)rielau(dot)com> wrote:
> Dear Hackers,
>
> I’m working on a patch that expands PG’s ability to add columns to a table
> without a table rewrite (i.e. at O(1) cost) from the
> nullable-without-default to a more general case. E.g.
>
> CREATE TABLE T(pk INT NOT NULL PRIMARY KEY);
> INSERT INTO T VALEUS (1), (2), (3);
> ALTER TABLE T ADD COLUMN c1 INTEGER NOT NULL DEFAULT 5;
> INSERT INTO T VALUES (4, DEFAULT);
> ALTER TABLE T ALTER COLUMN SET DEFAULT 6;
> INSERT INTO T VALUS (5, DEFAULT);
> SELECT * FROM T ORDER BY pk;
> =>
> (1, 5),
> (2, 5),
> (3, 5),
> (4, 5),
> (5, 6);
>
> Rows 1-3 have never been updated, yet they know that their values of c1 is
> 5.
>
> The requirement is driven by large tables for which add column takes too
> much time and/or produces too large a transaction for comfort.
>
> In simplified terms:
>
> * a second “exist default” is computed and stored in
> the catalogs at time of AT ADD COLUMN
>
> * The exist default is cached in the tuple descriptor (e.g in attrdef)
>
> * When one of the getAttr or copytuple related routines is invoked
> the exist default is filled in instead of simply NULL padding if the
> tuple is shorter the requested attribute number.
>
> Is there an interest in principle in the community for this functionality?

Wow! I think it would be great! It also solves huge vacuuming after
rewriting the table(s).
Just pay attention to corner cases like indexes, statistics and speed.

But I'd like to see solution for more important cases like:
CREATE TABLE t (pk INT NOT NULL PRIMARY KEY);
INSERT INTO t VALUES (1), (2), (3);
ALTER TABLE t ADD COLUMN c1 timestamptz NOT NULL DEFAULT 'now';
SELECT * FROM t ORDER BY pk;
ALTER TABLE t ADD COLUMN c2 serial;
SELECT * FROM t ORDER BY pk;
INSERT INTO t(pk) VALUES (4);
SELECT * FROM t ORDER BY pk;

P.S.: I really think it is a good idea, just some research is
necessary and covering corner cases...

--
Best regards,
Vitaly Burovoy

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2016-10-05 21:50:30 Switch to unnamed POSIX semaphores as our preferred sema code?
Previous Message Vladimir Gordiychuk 2016-10-05 21:35:19 Re: Stopping logical replication protocol