Batch insert in CTAS/MatView code

From: Paul Guo <pguo(at)pivotal(dot)io>
To: pgsql-hackers(at)postgresql(dot)org
Cc: Taylor Vesely <tvesely(at)pivotal(dot)io>
Subject: Batch insert in CTAS/MatView code
Date: 2019-03-06 14:06:27
Message-ID: CAEET0ZHRWxbRUgwzUK_tOFDWx7VE2-P=xMBT6-N+gAa9WQ=xxA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello, Postgres hackers,

The copy code has used batch insert with function heap_multi_insert() to
speed up. It seems that Create Table As or Materialized View could leverage
that code also to boost the performance also. Attached is a patch to
implement that. That was done by Taylor (cc-ed) and me.

The patch also modifies heap_multi_insert() a bit to do a bit further
code-level optimization by using static memory, instead of using memory
context and dynamic allocation. For Modifytable->insert, it seems that
there are more limitations for batch insert (trigger, etc?) but it seems
that it is possible that we could do batch insert for the case that we
could do?

By the way, while looking at the code, I noticed that there are 9 local
arrays with large length in toast_insert_or_update() which seems to be a
risk of stack overflow. Maybe we should put it as static or global.

Here is a quick simple performance testing on a mirrorless Postgres
instance with the SQLs below. The tests cover tables with small column
length, large column length and toast.

-- tuples with small size.
drop table if exists t1;
create table t1 (a int);

insert into t1 select * from generate_series(1, 10000000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are untoasted and data that is 1664 bytes wide
drop table if exists t1;
create table t1 (a name, b name, c name, d name, e name, f name, g name, h
name, i name, j name, k name, l name, m name, n name, o name, p name, q
name, r name, s name, t name, u name, v name, w name, x name, y name, z
name);

insert into t1 select 'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j',
'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y',
'z' from generate_series(1, 500000);
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

-- tuples that are toastable.
drop table if exists t1;
create table t1 (a text, b text, c text, d text, e text, f text, g text, h
text, i text, j text, k text, l text, m text, n text, o text, p text, q
text, r text, s text, t text, u text, v text, w text, x text, y text, z
text);

insert into t1 select i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i, i,
i, i, i, i, i, i, i, i from (select repeat('123456789', 10000) from
generate_series(1,2000)) i;
drop table if exists t2;
\timing
create table t2 as select * from t1;
\timing

Here are the timing results:

With the patch,

Time: 4728.142 ms (00:04.728)
Time: 14203.983 ms (00:14.204)
Time: 1008.669 ms (00:01.009)

Baseline,
Time: 11096.146 ms (00:11.096)
Time: 13106.741 ms (00:13.107)
Time: 1100.174 ms (00:01.100)

While for toast and large column size there is < 10% decrease but for small
column size the improvement is super good. Actually if I hardcode the batch
count as 4 all test cases are better but the improvement for small column
size is smaller than that with current patch. Pretty much the number 4 is
quite case specific so I can not hardcode that in the patch. Of course we
could further tune that but the current value seems to be a good trade-off?

Thanks.

Attachment Content-Type Size
0001-Heap-batch-insert-for-CTAS.patch application/octet-stream 7.0 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2019-03-06 14:36:07 Re: pg_dump is broken for partition tablespaces
Previous Message Karl O. Pinc 2019-03-06 13:09:48 Re: Patch to document base64 encoding