Re: should INSERT SELECT use a BulkInsertState?

From: Justin Pryzby <pryzby(at)telsasoft(dot)com>
To: Andres Freund <andres(at)anarazel(dot)de>
Cc: pgsql-hackers(at)lists(dot)postgresql(dot)org
Subject: Re: should INSERT SELECT use a BulkInsertState?
Date: 2020-07-13 01:57:00
Message-ID: 20200713015700.GA23581@telsasoft.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Jun 04, 2020 at 10:30:47AM -0700, Andres Freund wrote:
> On 2020-05-08 02:25:45 -0500, Justin Pryzby wrote:
> > Seems to me it should, at least conditionally. At least if there's a function
> > scan or a relation or ..
>
> Well, the problem is that this can cause very very significant
> regressions. As in 10x slower or more. The ringbuffer can cause constant
> XLogFlush() calls (due to the lsn interlock), and the eviction from
> shared_buffers (regardless of actual available) will mean future vacuums
> etc will be much slower. I think this is likely to cause pretty
> widespread regressions on upgrades.
>
> Now, it sucks that we have this problem in the general facility that's
> supposed to be used for this kind of bulk operation. But I don't really
> see it realistic as expanding use of bulk insert strategies unless we
> have some more fundamental fixes.

I made this conditional on BEGIN BULK/SET bulk, so I'll solicit comments on that.

postgres=# \t on \\ \set QUIET \\ VACUUM FULL t; \dt+ t \\ begin ; \timing on \\ INSERT INTO t SELECT * FROM t; rollback; SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2;
| public | t | table | pryzbyj | 35 MB |
|Time: 9497.318 ms (00:09.497)
| 33 | 1
| 3 | 2
| 18 | 3
| 5 | 4
| 4655 | 5
| 11670 |

vs

postgres=# \t on \\ \set QUIET \\ VACUUM FULL t; \dt+ t \\ begin BULK ; \timing on \\ INSERT INTO t SELECT * FROM t; rollback; SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2;
| public | t | table | pryzbyj | 35 MB |
|Time: 8268.780 ms (00:08.269)
| 2080 | 1
| 3 | 2
| 19 | 4
| 234 | 5
| 14048 |

And:

postgres=# begin ; \x \\ \t \\ SELECT statement_timestamp(); \o /dev/null \\ SELECT 'INSERT INTO t VALUES(0)' FROM generate_series(1,999999); \set ECHO errors \\ \set QUIET on \\ \o \\ \gexec \\ SELECT statement_timestamp(); abort; \x \\ SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; a
|statement_timestamp | 2020-07-12 20:31:43.717328-05
|statement_timestamp | 2020-07-12 20:36:16.692469-05
|
| 52 | 1
| 24 | 2
| 17 | 3
| 6 | 4
| 4531 | 5
| 11754 |

vs

postgres=# begin BULK ; \x \\ \t \\ SELECT statement_timestamp(); \o /dev/null \\ SELECT 'INSERT INTO t VALUES(0)' FROM generate_series(1,999999); \set ECHO errors \\ \set QUIET on \\ \o \\ \gexec \\ SELECT statement_timestamp(); abort; \x \\ SELECT COUNT(1), usagecount FROM pg_buffercache GROUP BY 2 ORDER BY 2; a
|statement_timestamp | 2020-07-12 20:43:47.089538-05
|statement_timestamp | 2020-07-12 20:48:04.798138-05
|
| 4456 | 1
| 22 | 2
| 1 | 3
| 7 | 4
| 79 | 5
| 11819 |

--
Justin

Attachment Content-Type Size
v2-0001-Allow-INSERT-SELECT-to-use-a-BulkInsertState.patch text/x-diff 7.2 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro Horiguchi 2020-07-13 01:57:36 Re: archive status ".ready" files may be created too early
Previous Message Kyotaro Horiguchi 2020-07-13 01:48:19 Re: Physical slot restart_lsn advances incorrectly requiring restore from archive