Re: Fast AT ADD COLUMN with DEFAULTs

From: Jim Nasby <Jim(dot)Nasby(at)BlueTreble(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Serge Rielau <serge(at)rielau(dot)com>
Cc: Vitaly Burovoy <vitaly(dot)burovoy(at)gmail(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Jeff Janes <jeff(dot)janes(at)gmail(dot)com>, Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Fast AT ADD COLUMN with DEFAULTs
Date: 2016-10-10 03:24:27
Message-ID: 15cce493-37e1-88ad-e02a-8ed9bff22a34@BlueTreble.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 10/6/16 11:01 AM, Tom Lane wrote:
> Something based on missing_value/absent_value could work for me too.
> If we name it something involving "default", that definitely increases
> the possibility for confusion with the regular user-settable default.
>
> Also worth thinking about here is that the regular default expression
> affects what will be put into future inserted rows, whereas this thing
> affects the interpretation of past rows. So it's really quite a different
> animal. That's kind of leading me away from calling it creation_default.

There's actually another use case here that's potentially extremely
valuable for warehousing and other "big data": compact representation of
a default value.

The idea here is that if you have a specific value for a field that
makes up a very large portion of your data, you'd really like to be able
to represent that value *in each row* with something like a bit (such as
we currently do for NULLs).

What I'd expect to see in the real world (once users figure this hack
out) would be:

CREATE TABLE ...(
...
-- skip field_a so we can handle all it's common values
);

INSERT INTO
...
SELECT
...
WHERE field_a IS NOT DISTINCT FROM 'really common value'
;

ALTER TABLE
ADD field_a ... NOT NULL DEFAULT 'really common value'
;

-- load rest of the data

That would have the effect of storing all those really common values
with a single bit.

What we'd ultimately want is some kind of catalog versioning so that we
knew what was in place when each tuple was created; that would allow for
changing these things over time without forcing a full rewrite.
--
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532) mobile: 512-569-9461

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Nasby 2016-10-10 03:26:07 Re: proposal: psql \setfileref
Previous Message Jim Nasby 2016-10-10 03:00:34 Re: pg_stat_statements and non default search_path