Re: ALTER TABLE ADD COLUMN fast default

From: Tomas Vondra <tomas(dot)vondra(at)2ndquadrant(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>
Cc: Thomas Munro <thomas(dot)munro(at)enterprisedb(dot)com>, David Rowley <david(dot)rowley(at)2ndquadrant(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: ALTER TABLE ADD COLUMN fast default
Date: 2018-02-03 00:44:30
Message-ID: e198660e-ca2d-a2ea-6210-5af376a345fe@2ndquadrant.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 02/01/2018 02:54 PM, Andres Freund wrote:
> Hi,
>
> On 2018-01-26 10:53:12 +1030, Andrew Dunstan wrote:
>> Yeah, thanks. revised patch attached
>
> Given that this patch touches code that's a huge bottleneck in a lot of
> cases, I think this needs benchmarks that heavily exercises tuple
> deforming.
>

That's a reasonable request, I guess, and I tried to collect such data
today. I measured two cases:

1) Table with 1000 columns and 64 rows, when there were no ALTER TABLE
adding columns with 'fast' defaults. This is meant to measure the best
case, with minimal impact from the patch. See the create.sql script
attached to this message, and the query.sql which was used for tests
using pgbench like this:

pgbench -n -f q.sql -T 15 test

after 100 runs, the results (tps) look like this:

min max median
--------------------------------------
master 1827 1873 1860
patched 2023 2066 2056

That is, the patch apparently improves the performance by about 10%
(according to perf profiles this is due to slot_deform_tuple getting
cheaper).

So this case seems fine.

2) Table with 64 rows and 1000 columns, all added by ALTER TABLE with
fast default without rewrite. See create-alter.sql.

Using the same query.sql as before, this shold significant drop to only
about 40 tps (from ~2000 tps for master). The profiles something like this:

+ 98.87% 98.87% postgres [.] slot_getmissingattrs
+ 98.77% 0.00% postgres [.] PortalRun
+ 98.77% 0.00% postgres [.] ExecAgg
+ 98.74% 0.01% postgres [.] ExecInterpExpr

which is kinda understandable, although the 2000 to 40 tps seems like a
pretty significant drop. But then again, this case is constructed like a
fairly extreme corner case.

However, there seems to be some sort of bug, because when I did VACUUM
FULL - ideally this would replace the "missing" default values with
actual values stored in the heap rows, eliminating the performance
impact. But the default values got lost and replaced by NULL values,
which seems like a clear data loss scenario.

I'm not quite sure what's wrong, but try this:

\i create-alter.sql

-- this returns 64, which is correct
SELECT COUNT(*) FROM t;

-- this actually retuns 64 rows with values "1"
SELECT c1000 FROM t;

-- this returns 63, which is incorrect (should be 64)
SELECT count(c1000) FROM t;

VACUUM FULL t;

-- suddenly we only get NULL values for all 64 rows
SELECT c1000 FROM t;

-- and now we got 0 (instead of 64)
SELECT count(c1000) FROM t;

regard

--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment Content-Type Size
create.sql application/sql 199.9 KB
create-alter.sql application/sql 43.9 KB
query.sql application/sql 26 bytes

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-02-03 00:48:05 Re: [HACKERS] proposal: schema variables
Previous Message Chapman Flack 2018-02-03 00:37:08 pie-in-sky idea: 'sensitive' function parameters