Re: Fast AT ADD COLUMN with DEFAULTs

From: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>
To: Serge Rielau <serge(at)rielau(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Date: 2016-10-06 17:03:24
Message-ID: CAKOSWNmF-ZZtntD4BASZxv8S86UWqZt-tEXER5WdrCcV=1Ok5g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/6/16, Serge Rielau <serge(at)rielau(dot)com> wrote:
>> On Oct 6, 2016, at 9:20 AM, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>> Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com> writes:
>>> But what I discover for myself is that we have pg_attrdef separately
>>> from the pg_attribute. Why?
>>
>> The core reason for that is that the default expression needs to be
>> a separate object from the column for purposes of dependency analysis.
>> For example, if you have a column whose default is "foo()", then the
>> default expression depends on the function foo(), but the column should
>> not: if you drop the function, only the default expression ought to
>> be dropped, not the column.
>>
>> Because of this, the default expression needs to have its own OID
>> (to be stored in pg_depend) and it's convenient to store it in a
>> separate catalog so that the classoid can identify it as being a
>> default expression rather than some other kind of object.
>
> Good to know.
>
>> If we were going to allow these missing_values or creation_defaults
>> or whatever they're called to be general expressions, then they would
>> need
>> to have their own OIDs for dependency purposes. That would lead me to
>> think that the best representation is to put them in their own rows in
>> pg_attrdef, perhaps adding a boolean column to pg_attrdef to distinguish
>> regular defaults from these things. Or maybe they even need their own
>> catalog, depending on whether you think dependency analysis would want
>> to distinguish them from regular defaults using just the classed.
>>
>> Now, as I just pointed out in another mail, realistically we're probably
>> going to restrict the feature to simple constants, which'd mean they will
>> depend only on the column's type and can never need any dependencies of
>> their own. So we could take the shortcut of just storing them in a new
>> column in pg_attribute.

I agree with you.

>> But maybe that's shortsighted and we'll
>> eventually wish we'd done them as full-fledged separate objects.

I don't think so. If we try to implement non-blocking adding columns
with volatile defaults (and for instance update old rows in the
background), we can end up with the next situation:

CREATE TABLE a(i bigint PRIMARY KEY);
INSERT INTO a SELECT generate_series(1,10000000000);

ALTER TABLE a ADD COLUMN b bigserial CHECK (b BETWEEN 1 AND 100);

For indexes (even unique) created concurrently similar troubles are
solved with a "not valid" mark, but what to do with a heap if we try
to do it in the background?

>> But on the third hand ... once one of these is in place, how could you
>> drop it separately from the column? That would amount to a change in the
>> column's stored data, which is not what one would expect from dropping
>> a separate object. So maybe it's senseless to think that these things
>> could ever be distinct objects. But that definitely leads to the
>> conclusion that they're constants and nothing else.

> I cannot follow this reasoning.
> Let’s look past what PG does today:
> For each row (whether that’s necessary or not) we evaluate the expression,
> compute the value and
> store it in the rewritten table.
> We do not record dependencies on the “pedigree” of the value.
> It happened to originate from the DEFAULT expression provided with the ADD
> COLUMN,
> but that is not remembered anywhere.
> All we remember is the value - in each row.
> So the only change that is proposed here - when it comes right down to it -
> is to remember the value once only (IFF it is provably the same for each row)
> and thus avoid the need to rewrite the table.
> So I see no reason to impose any restriction other than “evaluated value is
> provably the same for every row”.

Tom says the same thing. The expression at the end should be a value
if it allows to avoid rewriting table.

> Regarding the location of storage.
> I did start of using pg_attrdef, but ran into some snags.
> My approach was to add the value as an extra column (rather than an extra
> row).
> That caused trouble since a SET DEFAULT operation is decomposed into a DROP
> and a SET and
> preserving the value across such operations did not come naturally.

I'm sorry for making you be confused. The best way is to use an extra
column in the pg_attribute to store serialized value.

> If we were to use extra rows instead that issue would be solved, assuming we
> add a “default kind” sort of column.
> It would dictate the storage format though which may be considered overkill
> for a a constant.

--
Best regards,
Vitaly Burovoy

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Pavan Deolasee 2016-10-06 17:59:38 FSM corruption leading to errors
Previous Message Serge Rielau 2016-10-06 16:39:05 Re: Fast AT ADD COLUMN with DEFAULTs