Re: Lazy constraints / defaults

From: "Dawid Kuroczko" <qnex42(at)gmail(dot)com>
To: "Tom Lane" <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Michał Zaborowski <michal(dot)zaborowski(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Lazy constraints / defaults
Date: 2008-03-09 21:45:59
Message-ID: 758d5e7f0803091445o2e0a0f74vaeac12d5c6c880bd@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sun, Mar 9, 2008 at 7:25 PM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
> "=?ISO-8859-2?Q?Micha=B3_Zaborowski?=" <michal(dot)zaborowski(at)gmail(dot)com> writes:
> > I would like to be able to add CONSTRAINT and/or DEFAULT with out
> > affecting old rows.
>
> You mean without actually checking that the old rows satisfy the
> constraint? There's approximately zero chance that that proposal
> will be accepted.

I think the problem here is to minimize the time when table is held by
exclusive lock,
Something similar to the CREATE INDEX CONCURRENTLY (i.e. hold exclusive lock
for a jiffy, then do the actual work for the old tuples).

So, the proposal would read as to add the ability to perform:

ALTER TABLE CONCURRENTLY ALTER COLUMN foo SET NOT NULL
...where exclusive lock would be held to place the constraint (so all new
tuples would satisfy it), lock would be released and the old tuples would
be checked to make sure the constraint is valid.

Should a NULL value be found or should the backend die, the constraint
should disappear or be marked invalid.

> > Yes, it sounds strange, but... Let's say I have
> > big table, I want to add new column, with DEFAULT and NOT NULL.
> > Normally it means long exclusive lock. So - right now I'm adding plain
> > new column, then DEFAULT, then UPDATE on all rows in chunks, then NOT
> > NULL... Can it be little simpler?
>
> Just do it all in one ALTER command.
>
> alter table tab add column col integer not null default 42 check (col > 0);

I think this will not solve the OP's problem. He wants to minimize the time
a table is under exclusive lock, and this ALTER command will effectively
rewrite the whole table (to add new not null column).

Probably a workable solution would be to play with inheritance:
-- Add the NULL col colum:
ALTER TABLE tab ADD COLUMN col integer;
-- Create a table which will have col NOT NULL
CREATE TABLE tab_new (LIKE tab INCLUDING DEFAULTS INCLUDING
CONSTRAINTS INCLUDING INDEXES ) INHERITS (tab);
ALTER TABLE tab_new ALTER COLUMN col SET NOT NULL;
-- Make the new values go to tab_new, if simple enough same might be
done for UPDATEs
CREATE RULE insert_new AS ON INSERT TO tab DO INSTEAD INSERT INTO
tab_new VALUES (NEW.*);

-- Now, make a job which will do something like this:
START TRANSACTION ISOLATON LEVEL SERIALIZABLE;
UPDATE ONLY tab SET col = 42 WHERE id BETWEEN n AND n + 1000;
INSERT INTO tab_new SELECT * FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
-- or better:
-- INSERT INTO tab_new SELECT a,b,c,42 AS col FROM ONLY tab WHERE id
BETWEEN n AND n + 1000 FOR UPDATE;
DELETE FROM ONLY tab WHERE id BETWEEN n AND n + 1000;
COMMIT;

-- Finally, exhange parti^W^W get rid of old tab:
SELECT count(*) FROM ONLY tab; -- should be zero
ALTER TABLE tab RENAME TO tab_old;
ALTER TABLE tab_new RENAME TO tab;
ALTER TABLE tab NO INHERIT tab_old;

Of course each step should be done in transaction, probably starting
with explicit LOCK. And extra care should be taken
with respect to the UNIQUE constraints. In short: unless you are 100%
sure what you are doing, don't. :-)

Regards,
Dawid

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Denne 2008-03-09 22:41:16 Estimating geometric distributions
Previous Message Stephen Denne 2008-03-09 21:24:18 Re: Maximum statistics target