Re: INSERT ... ON CONFLICT documentation clean-up patch

From: Andres Freund <andres(at)anarazel(dot)de>
To: Peter Geoghegan <pg(at)heroku(dot)com>
Cc: Pg Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: INSERT ... ON CONFLICT documentation clean-up patch
Date: 2015-10-14 16:38:35
Message-ID: 20151014163835.GJ30738@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2015-10-11 17:54:24 -0700, Peter Geoghegan wrote:
> + <listitem>
> + <para>
> + <command>INSERT</command> statements with <literal>ON
> + CONFLICT</> clauses do not work sensibly with the partitioning
> + schemes shown here, since trigger functions are not currently
> + capable of examining the structure of the original
> + <command>INSERT</command>. In particular, trigger functions
> + have no way to determine how the <command>INSERT</command>
> + command's author might have intended <quote>redirected</>
> + inserts to handle conflicts in child tables. Even the
> + involvement of an <literal>ON CONFLICT</> clause is not exposed
> + to trigger functions.
> + </para>
> + </listitem>
> +

"do not work sensibly" imo doesn't sound very good in docs. Maybe
something roughly along the lines of "are unlikely to work as expected
as the on conflict action is only taken in case of unique violation on
the target relation, not child relations". I'd remove the whole bit
about triggers not having access to ON CONFLICT.

> </itemizedlist>
> </para>
>
> diff --git a/doc/src/sgml/ref/insert.sgml b/doc/src/sgml/ref/insert.sgml
> index 8caf5fe..0794acb3 100644
> --- a/doc/src/sgml/ref/insert.sgml
> +++ b/doc/src/sgml/ref/insert.sgml
> @@ -99,7 +99,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> <para>
> You must have <literal>INSERT</literal> privilege on a table in
> order to insert into it. If <literal>ON CONFLICT DO UPDATE</> is
> - present the <literal>UPDATE</literal> privilege is also required.
> + present, <literal>UPDATE</literal> privilege on the table is also
> + required.
> </para>

Is this actually an improvement?

> <para>
> @@ -161,10 +162,7 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> <listitem>
> <para>
> A substitute name for the target table. When an alias is provided, it
> - completely hides the actual name of the table. This is particularly
> - useful when using <literal>ON CONFLICT DO UPDATE</literal> into a table
> - named <literal>excluded</literal> as that's also the name of the
> - pseudo-relation containing the proposed row.
> + completely hides the actual name of the table.
> </para>
> </listitem>
> </varlistentry>

Hm?

> @@ -395,19 +393,23 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> <parameter>conflict_target</parameter> describes which conflicts
> are handled by the <literal>ON CONFLICT</literal> clause. Either a
> <emphasis>unique index inference</emphasis> clause or an explicitly
> - named constraint can be used. For <literal>ON CONFLICT DO
> - NOTHING</literal>, it is optional to specify a
> - <parameter>conflict_target</parameter>; when omitted, conflicts
> - with all usable constraints (and unique indexes) are handled. For
> - <literal>ON CONFLICT DO UPDATE</literal>, a conflict target
> - <emphasis>must</emphasis> be specified.
> + named constraint can be used. These constraints and/or unique
> + indexes are <firstterm>arbiter indexes</firstterm>.

I'm not convinced it's an improvement to talk about arbiter indexes
here.

> + For
> + <literal>ON CONFLICT DO NOTHING</literal>, it is optional to
> + specify a <parameter>conflict_target</parameter>; when omitted,
> + conflicts with all usable constraints (and unique indexes) are
> + handled. For <literal>ON CONFLICT DO UPDATE</literal>, a
> + <parameter>conflict_target</parameter> <emphasis>must</emphasis> be
> + provided.

Yes, that's a bit clearer.

> Every time an insertion without <literal>ON CONFLICT</literal>
> would ordinarily raise an error due to violating one of the
> - inferred (or explicitly named) constraints, a conflict (as in
> - <literal>ON CONFLICT</literal>) occurs, and the alternative action,
> - as specified by <parameter>conflict_action</parameter> is taken.
> - This happens on a row-by-row basis.
> + inferred constraints/indexes (or an explicitly named constraint), a
> + conflict (as in <literal>ON CONFLICT</literal>) occurs, and the
> + alternative action, as specified by
> + <parameter>conflict_action</parameter> is taken. This happens on a
> + row-by-row basis. Only <literal>NOT DEFERRABLE</literal>
> + constraints are supported as arbiters.
> </para>

I'm inclined ot only add the "Only <literal>NOT DEFERRABLE</literal> constraints are
supported as arbiters." bit - the rest doesn't really seem to be an
improvement?

> <para>
> @@ -425,17 +427,28 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> specified columns/expressions and, if specified, whose predicate
> implies the <replaceable class="PARAMETER">
> index_predicate</replaceable> are chosen as arbiter indexes. Note
> - that this means an index without a predicate will be used if a
> - non-partial index matching every other criteria happens to be
> - available.
> + that this means a unique index without a predicate will be inferred
> + (and used by <literal>ON CONFLICT</literal> as an arbiter) if such
> + an index satisfying every other criteria is available.
> </para>

Hm. I agree that the "happens to be available" sounds a bit too
casual. But I think the sentence was easier to understand for endusers
before?

> + <tip>
> + <para>
> + A unique index inference clause should be preferred over naming a
> + constraint directly using <literal>ON CONFLICT ON
> + CONSTRAINT</literal> <replaceable class="PARAMETER">
> + constraint_name</replaceable>. Unique index inference is
> + adaptable to nonsignificant changes in the available constraints.
> + Furthermore, it is possible for more than one constraint and/or
> + unique index to be inferred for the same statement.
> + </para>
> + </tip>

I'd formulate this a more neutral. How something like "... inference
...

> <para>
> <parameter>conflict_action</parameter> defines the action to be
> taken in case of conflict. <literal>ON CONFLICT DO
> @@ -447,11 +460,8 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> <literal>ON CONFLICT DO UPDATE</literal> guarantees an atomic
> <command>INSERT</command> or <command>UPDATE</command> outcome - provided
> there is no independent error, one of those two outcomes is guaranteed,
> - even under high concurrency. This feature is also known as
> - <firstterm>UPSERT</firstterm>.
> -
> - Note that exclusion constraints are not supported with
> - <literal>ON CONFLICT DO UPDATE</literal>.
> + even under high concurrency. This is also known as
> + <firstterm>UPSERT</firstterm> &mdash; <quote>UPDATE or INSERT</quote>.
> </para>

>
> @@ -483,8 +494,10 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
> that the command will not be allowed to affect any single existing
> row more than once; a cardinality violation error will be raised
> when this situation arises. Rows proposed for insertion should not
> - duplicate each other in terms of attributes constrained by the
> - conflict-arbitrating unique index.
> + duplicate each other in terms of attributes constrained by a
> + conflict-arbitrating index or constraint. Note that exclusion
> + constraints are not supported with <literal>ON CONFLICT DO
> + UPDATE</literal>.
> </para>
> </refsect1>
>

Why did you move the exclusion constraint bit? Isn't it more appropriate
in the action section?

> <para>
> @@ -466,14 +476,15 @@ INSERT INTO <replaceable class="PARAMETER">table_name</replaceable> [ AS <replac
>
> <para>
> The <literal>SET</literal> and <literal>WHERE</literal> clauses in
> - <literal>ON CONFLICT UPDATE</literal> have access to the existing
> - row, using the table's name, and to the row
> - proposed for insertion, using the <varname>excluded</varname>
> - alias. The <varname>excluded</varname> alias requires
> - <literal>SELECT</> privilege on any column whose values are read.
> + <literal>ON CONFLICT DO UPDATE</literal> have access to the
> + existing row using the table's name (or an alias), and to the row
> + proposed for insertion using the special
> + <varname>EXCLUDED</varname> table. <literal>SELECT</> privilege is
> + required on any column in the target table where corresponding
> + <varname>EXCLUDED</varname> columns are read.

> Note that the effects of all per-row <literal>BEFORE INSERT</literal>
> - triggers are reflected in <varname>excluded</varname> values, since those
> + triggers are reflected in <varname>EXCLUDED</varname> values, since those
> effects may have contributed to the row being excluded from insertion.
> </para>

It's not correct to use uppercase EXCLUDED here imo - the pseudo table
is named "excluded" which is important in case the table name gets
quoted.

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Fetter 2015-10-14 16:41:46 Re: Proposal: pg_confcheck - syntactic & semantic validation of postgresql configuration files
Previous Message Jeff Janes 2015-10-14 16:34:04 pg_restore cancel TODO