Re: should INSERT SELECT use a BulkInsertState?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: should INSERT SELECT use a BulkInsertState?
Date: 2020-10-30 04:51:38
Message-ID: 20201030045138.GA3080@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Oct 22, 2020 at 01:29:53PM +0100, Simon Riggs wrote:
> On Fri, 16 Oct 2020 at 22:05, Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> > > > I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.
>
> I think it would be better if this was self-tuning. So that we don't
> allocate a bulkinsert state until we've done say 100 (?) rows
> inserted.

I made it an optional, non-default behavior in response to the legitimate
concern for performance regression for the cases where a loader needs to be as
fast as possible - as compared with our case, where we want instead to optimize
for our reports by making the loaders responsible for their own writes, rather
than leaving behind many dirty pages, and clobbering the cache, too.

Also, INSERT SELECT doesn't immediately help us (telsasoft), since we use
INSERT .. VALUES () .. ON CONFLICT. This would handle that case, which is
great, even though that wasn't a design goal. It could also be an integer GUC
to allow configuring the size of the ring buffer.

> You should also use table_multi_insert() since that will give further
> performance gains by reducing block access overheads. Switching from
> single row to multi-row should also only happen once we've loaded a
> few rows, so we don't introduce overahads for smaller SQL statements.

Good idea...multi_insert (which reduces the overhead of individual inserts) is
mostly independent from BulkInsert state (which uses a ring-buffer to avoid
dirtying the cache). I made this 0002.

This makes INSERT SELECT several times faster, and not clobber the cache too.

Time: 4700.606 ms (00:04.701)
123 | 1
37 | 2
20 | 3
11 | 4
4537 | 5
11656 |

Time: 1125.302 ms (00:01.125)
2171 | 1
37 | 2
20 | 3
11 | 4
111 | 5
14034 |

When enabled, this passes nearly all regression tests, and all but 2 of the
changes are easily understood. The 2nd patch still needs work.

--
Justin

Attachment Content-Type Size
v5-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch text/x-diff 9.0 KB
v5-0002-Make-INSERT-SELECT-use-multi_insert.patch text/x-diff 24.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2020-10-30 05:07:21 Re: Enumize logical replication message actions
Previous Message Kyotaro Horiguchi 2020-10-30 04:47:55 Re: MINUS SIGN (U+2212) in EUC-JP encoding is mapped to FULLWIDTH HYPHEN-MINUS (U+FF0D) in UTF-8