| From: | Haibo Yan <tristan(dot)yim(at)gmail(dot)com> |
|---|---|
| To: | Daniil Davydov <3danissimo(at)gmail(dot)com> |
| Cc: | Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>, pgsql-hackers(at)lists(dot)postgresql(dot)org, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com> |
| Subject: | Re: Introduce new multi insert Table AM and improve performance of various SQL commands with it for Heap AM |
| Date: | 2026-04-28 04:27:16 |
| Message-ID: | CABXr29ExTDUoWZsiA0w+mkrjkJWKhPdiseM7vHJgHY-Mwq9PNg@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On Sun, Apr 26, 2026 at 2:56 PM Daniil Davydov <3danissimo(at)gmail(dot)com> wrote:
> Hi,
>
> On Sun, Apr 6, 2025 at 8:55 PM Jingtang Zhang <mrdrivingduck(at)gmail(dot)com>
> wrote:
> >
> > It was quite a while since I last looked at the patch. I've tested it
> again,
> > and still get regression on patched version where a table has many
> columns.
> > And it is totally CPU-bounded on tts_virtual_copyslot.
> >
> > Unpatched version:
> > 1 col:
> > Time: 8909.714 ms (00:08.910)
> > Time: 8803.579 ms (00:08.804)
> > Time: 8600.415 ms (00:08.600)
> > 32 cols:
> > Time: 12911.699 ms (00:12.912)
> > Time: 13543.491 ms (00:13.543)
> > Time: 13325.368 ms (00:13.325)
> >
> > Patched version:
> > 1 col:
> > Time: 3532.841 ms (00:03.533)
> > Time: 3598.223 ms (00:03.598)
> > Time: 3515.858 ms (00:03.516)
> > 32 cols:
> > Time: 35647.724 ms (00:35.648)
> > Time: 35596.233 ms (00:35.596)
> > Time: 35669.106 ms (00:35.669)
> >
>
> Hm, maybe I didn't choose the best way to measure performance. Can you
> please share how you do it?
>
> > I've tested your patch with tuplestore and found the regression does not
> exist
> > anymore, but I haven't look deep inside it.
> >
> > Patched version (with tuplestore):
> > 1 col:
> > Time: 3500.502 ms (00:03.501)
> > Time: 3486.886 ms (00:03.487)
> > Time: 3514.233 ms (00:03.514)
> > 32 cols:
> > Time: 10375.391 ms (00:10.375)
> > Time: 10248.256 ms (00:10.248)
> > Time: 10248.289 ms (00:10.248)
> >
> > It seems to be a good idea if there is no other issue with your patch.
>
> As far as I understand, the use of multi inserts for queries like
> "INSERT INTO ... SELECT FROM" is not discussed here anymore due to the
> fact that in such cases we will have to take into account the volatile
> functions and ROW triggers.
> I've been thinking about this for a while and made a patch as an
> experiment. The principles that the patch works on are listed below.
> 1)
> Since performance decreases for single INSERTs (within a multi inserts
> mechanism), I designed this feature as an option for the table. Thus,
> if the user knows that he will perform a lot of inserts on the table,
> he can specify "WITH (append_optimized=true)".
> 2)
> The availability of volatile functions is monitored during the
> construction of a subtree for a ModifyTable node. I'm not that
> familiar with the query plan construction mechanism, but it seems to
> me that this way we can track any occurrence of volatile functions.
> Of course, most volatile functions don't stop us from using multi
> inserts, but checking each such function would take a very long time,
> so the very fact of having a volatile function is enough for us to
> abandon multi-inserts.
> 3)
> Default expressions of the target table are also checked for volatile
> functions. The same rules apply to them as in (2). As an exception, I
> allowed the use of SERIAL in the column data type, since this is a
> fairly common use case.
> 4)
> If the target table contains any ROW triggers, we don't use multi insert.
> 5)
> Patch also contains a regression test. This is a "sandbox" where you
> can do some experiments with append-optimized tables.
>
> I hope that patch (targeted on 'master' branch,
> 2c7bd2ba507e273f2d7fe1b2f6d30775ed4f3c09) will be useful for this
> thread.
>
> --
> Best regards,
> Daniil Davydov
>
Hi all,
I picked this work up again and implemented the full 5-patch series.
The series is structured as follows:
- 0001 adds the buffered-insert lifecycle API in tableam/heapam and
provides the heap implementation.
- 0002 adopts the API for CTAS.
- 0003 adopts the API for CREATE MATERIALIZED VIEW and REFRESH
MATERIALIZED VIEW.
- 0004 adopts the API for COPY FROM.
- 0005 adopts the API for a restricted first step of INSERT INTO … SELECT
.
I also reran performance testing locally on my machine:
- Hardware: MacBook Pro M4, 36GB RAM
- shared_buffers: 128MB
I compared the unpatched baseline against the current patched series for
CTAS, CMV, RMV, and INSERT INTO ... SELECT.
Table 1 — Median (ms)
Workload
10K Before
10K After
10K Improv
100K Before
100K After
100K Improv
1M Before
1M After
1M Improv
CTAS
1.60
1.17
+26.9%
9.19
5.70
+38.0%
105.61
73.28
+30.6%
CMV
2.11
1.77
+16.1%
10.28
6.20
+39.7%
110.10
79.64
+27.7%
RMV
1.62
1.19
+26.5%
9.91
5.43
+45.2%
106.04
69.57
+34.4%
INSERT … SELECT
1.72
0.89
+48.3%
15.39
7.46
+51.5%
228.24
84.66
+62.9%
Table 2 — Average (ms)
Workload
10K Before
10K After
10K Improv
100K Before
100K After
100K Improv
1M Before
1M After
1M Improv
CTAS
1.65
1.26
+24.1%
9.37
5.82
+37.9%
104.81
74.31
+29.1%
CMV
2.34
1.82
+22.2%
10.32
6.25
+39.4%
110.32
80.50
+27.0%
RMV
1.92
1.21
+36.9%
9.86
5.49
+44.3%
106.79
69.53
+34.9%
INSERT … SELECT
1.69
0.90
+46.8%
15.45
7.39
+52.2%
210.62
85.19
+59.6%
These numbers look ok to me to continue the discussion with the current
design and implementation.
One point worth calling out explicitly: the INSERT INTO ... SELECT support
in patch 5 is intentionally limited, as described above. It is not intended
to claim broad executor coverage yet.
One further improvement still seems possible: making the heap
implementation cache raw HeapTuple bytes directly instead of maintaining
buffered slot arrays. I looked at that direction, but did not include it in
this series because it felt like a larger scope change than what I wanted
for v1.
At this point, I think the current series is in reasonable shape and I’d
really appreciate review on both the API shape and the caller adoptions.
Thanks,
Haibo
| Attachment | Content-Type | Size |
|---|---|---|
| v1-0001-tableam-heapam-add-buffered-insert-lifecycle-API-.patch | application/octet-stream | 41.2 KB |
| v1-0005-executor-adopt-buffered-insert-API-for-restricted.patch | application/octet-stream | 24.3 KB |
| v1-0002-createas-use-buffered-insert-API-for-CTAS.patch | application/octet-stream | 11.6 KB |
| v1-0003-matview-use-buffered-insert-API-for-CMV-and-RMV.patch | application/octet-stream | 12.0 KB |
| v1-0004-copy-adopt-buffered-insert-API-for-COPY-FROM.patch | application/octet-stream | 18.4 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | David Rowley | 2026-04-28 04:27:59 | Any reason to keep HEAP_HASOID_OLD? |
| Previous Message | Amit Kapila | 2026-04-28 04:25:35 | Re: Proposal: Conflict log history table for Logical Replication |