Re: should INSERT SELECT use a BulkInsertState?

From: Bharath Rupireddy <bharath(dot)rupireddyforpostgres(at)gmail(dot)com>
To: Justin Pryzby <pryzby(at)telsasoft(dot)com>
Cc: Luc Vlaming <luc(at)swarm64(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Heikki Linnakangas <hlinnaka(at)iki(dot)fi>
Subject: Re: should INSERT SELECT use a BulkInsertState?
Date: 2020-12-03 05:29:34
Message-ID: CALj2ACUee9pP24yLLXEiQhv3RkGGyjr-+gj+omNr-i+XThBnYw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Dec 2, 2020 at 10:24 PM Justin Pryzby <pryzby(at)telsasoft(dot)com> wrote:
>
> One loose end in this patch is how to check for volatile default expressions.
>
> copyfrom.c is a utility statement, so it can look at the parser's column list:
> COPY table(c1,c2)...
>
> However, for INSERT, in nodeModifyTable.c, it looks like parsing, rewriting,
> and planning are done, at which point I don't know if there's a good way to
> find that. The default expressions will have been rewritten into the planned
> statement.
>
> We need the list of columns whose default is volatile, excluding columns for
> which a non-default value is specified.
>
> INSERT INTO table (c1,c2) VALUES (1,default);
>
> We'd want the list of any column in the table with a volatile default,
> excluding columns c1, but not-excluding explicit default columns c2 or any
> implicit default columns (c3, etc).
>
> Any idea ?
>

I think we should be doing all the necessary checks in the planner and
have a flag in the planned stmt to indicate whether to go with multi
insert or not. For the required checks, we can have a look at how the
existing COPY decides to go with either CIM_MULTI or CIM_SINGLE.

Now, the question of how we can get to know whether a given relation
has default expressions or volatile expressions, it is worth to look
at build_column_default() and contain_volatile_functions().

I prefer to have the multi insert deciding code in COPY and INSERT
SELECT, in a single common function which can be reused. Though COPY
has somethings like default expressions and others ready unlike INSERT
SELECT, we can try to keep them under a common function and say for
COPY we can skip some code and for INSERT SELECT we can do extra work
to find default expressions.

Although unrelated, for parallel inserts in INSERT SELECT[1], in the
planner there are some checks to see if the parallelism is safe or
not. Check max_parallel_hazard_for_modify() in
v8-0001-Enable-parallel-SELECT-for-INSERT-INTO-.-SELECT.patch from
[1]. On the similar lines, we can also have multi insert deciding
code.

[1] https://www.postgresql.org/message-id/CAJcOf-fy3P%2BkDArvmbEtdQTxFMf7Rn2%3DV-sqCnMmKO3QKBsgPA%40mail.gmail.com

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Craig Ringer 2020-12-03 06:01:16 Re: [PATCH] Runtime control of CLOBBER_CACHE_ALWAYS
Previous Message Kyotaro Horiguchi 2020-12-03 05:28:57 Re: Huge memory consumption on partitioned table with FKs