Test report on GENERATED/IDENTITY

From: Zoltan Boszormenyi <zboszor(at)dunaweb(dot)hu>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Test report on GENERATED/IDENTITY
Date: 2007-03-07 16:22:13
Message-ID: 45EEE6B5.2020305@dunaweb.hu
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

I made some tests to prove that GENERATED can help
boost performance. I created a table like this:

create table t1 (
id serial,
i1 integer,
i2 integer,
g1 integer generated always as (
case when i1 is null then i2
when i2 is null then i1
else i1 + i2 end));

I inserted 1 million records into the table:

for (i = 1; i <= 1000; i++)
for (j = 1; j <= 1000; j++)
INSERT INTO t1 (i1, i2) VALUES (i, j);

After VACUUM FULL ANALYZE,
I timed SELECT id, i1, i2, g1 FROM t1
and SELECT id, i1, i2, <generation expression> FROM t1,
result redirected to /dev/null. Results of ten consecutive runs are:

SELECT id, i1, i2, g1 FROM t1
------------------------------------------

2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33536minor)pagefaults 0swaps
2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33826minor)pagefaults 0swaps
2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33574minor)pagefaults 0swaps
2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32147minor)pagefaults 0swaps
2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33584minor)pagefaults 0swaps
2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34066minor)pagefaults 0swaps

Average is 4.68 seconds.

SELECT id, i1, i2, <generation expression> FROM t1
------------------------------------------

2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33945minor)pagefaults 0swaps
2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33917minor)pagefaults 0swaps
2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32027minor)pagefaults 0swaps
2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32626minor)pagefaults 0swaps
2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32550minor)pagefaults 0swaps
2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32609minor)pagefaults 0swaps
2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33042minor)pagefaults 0swaps
2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32052minor)pagefaults 0swaps

Average is 5.16 seconds.

Using a single GENERATED column gains a good 9.4% in runtime.

With the IDENTITY, the picture is not so bright.
I expected some cost but not that much. Why is
[simple_]heap_update() so expensive? I created a table and
times inserting 1 million rows into it:

create table t2 (id serial, i1 integer);
or
create table t2 (id serial generated always as identity, i1 integer);

Using a serial column gave me about 12 seconds
on the average of 5 runs. With an IDENTITY column,
I got 61 seconds once and 66 seconds twice.
So, the strictness of the identity column gave me 500-550%
performance penalty.

With a single unique index on i1, I got 24.4 seconds
with the serial column and 67 seconds for the identity
column. I run these only once so this last one isn't
representative.

I tried to use heap_inplace_update() to update the
newly updated or inserted tuple in place but it gave me

ERROR: heap_inplace_update: wrong tuple length

even when I already filled the IDENTITY column with
a constant Datum with an Int64 value 0 converted to
the type of the column.

If I read it correctly, the HOT patch would give me
a speedup for this case?

Best regards,
Zoltán Böszörményi

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fuhry 2007-03-07 16:28:22 Re: PostgreSQL - 'SKYLINE OF' clause added!
Previous Message Teodor Sigaev 2007-03-07 16:19:11 Re: GIST and TOAST