Re: [PATCH] Support for foreign keys with arrays

From: Noah Misch <noah(at)leadboat(dot)com>
To: Marco Nenciarini <marco(dot)nenciarini(at)2ndQuadrant(dot)it>
Cc: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndQuadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2012-02-25 02:01:35
Message-ID: 20120225020135.GA27204@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Marco,

This version fixes everything I noted in my last review. Apart from corner
cases I note, the patch is technically sound. I pose a few policy-type
questions; comments from a broader audience would help those.

On Mon, Feb 06, 2012 at 07:04:42PM +0100, Marco Nenciarini wrote:
> Please find attached version 3 of our patch. We thoroughly followed your
> suggestions and were able to implement "EACH foreign key constraints"
> with multi-column syntax as well.

[actual review based on v3b]

> * support for EACH foreign keys in multi-column foreign key table
> constraints
> - e.g. FOREIGN KEY (c1, EACH c2) REFERENCES t1 (u1, u2)

You support, and have test cases for, constraints with multiple EACH columns.
The documentation and comments do not explain their semantics. On reviewing
the behavior and code, you have implemented it in terms of a Cartesian
product. That's logical, but when is such a constraint actually useful? If
someone can think of an application, great; let's document his example.
Otherwise, let's reject such constraints at DDL time.

> As previously said, we preferred to keep this patch simple for 9.2 and
> forbid EACH CASCADE and EACH SET NULL on multi-column foreign keys.
> After all, majority of use cases is represented by EACH foreign key
> column constraints (single-column foreign key arrays), and more
> complicated use cases can be discussed for 9.3 - should this patch make
> it. :)

Good call.

> We can use multi-dimensional arrays as well as referencing columns. In
> that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> SET NULL. This is a safe way of implementing the action.
> We have some ideas on how to implement this, but we feel it is better to
> limit the behaviour for now.

This still feels awfully unprincipled to me. How about just throwing an error
when we need to remove an element from a multidimensional array? Essentially,
have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
a multidimensional array. That's strictly less code than what you have now,
and it keeps our options open. We can always change from error to set-null
later, but we can't so readily change from set-null to anything else.

As I pondered this patch again, I came upon formal hazards around non-default
operator classes. Today, ATAddForeignKeyConstraint() always chooses pfeqop,
ffeqop and ppeqop in the same operator family. If it neglected this, we would
get wrong behavior when one of the operators is sensitive to a change to which
another is insensitive. For EACH FK constraints, this patch sets ffeqop =
ARRAY_EQ_OP unconditionally. array_eq() uses the TYPECACHE_EQ_OPR (usually
from the default B-tree operator class). That operator may not exist at all,
let alone share an operator family with pfeqop. Shall we deal with this by
retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
constraint creation if it does not share an operator family with pfeqop? The
same hazard affects ri_triggers.c use of array_remove() and array_replace(),
and the same change mitigates it.

Let me say again how much I like the test coverage from this patch. It's
great to think of something worth verifying and find existing coverage for it
in the submitted test cases.

> *** a/doc/src/sgml/catalogs.sgml
> --- b/doc/src/sgml/catalogs.sgml

> ***************
> *** 2102,2107 ****
> --- 2106,2118 ----
> <entry></entry>
> <entry>If a check constraint, a human-readable representation of the expression</entry>
> </row>
> +
> + <row>
> + <entry><structfield>confisarray</structfield></entry>

Name is now confiseach.

pg_constraint.confeach needs documentation.

> + <entry><type>bool</type></entry>
> + <entry></entry>
> + <entry>If a foreign key, true if a EACH REFERENCE foreign key</entry>
> + </row>
> </tbody>
> </tgroup>
> </table>
> *** a/doc/src/sgml/ddl.sgml
> --- b/doc/src/sgml/ddl.sgml

> + <para>
> + Another option you have with foreign keys is to use a
> + referencing column which is an array of elements with
> + the same type (or a compatible one) as the referenced
> + column in the related table. This feature, commonly known
> + as <firstterm>foreign key arrays</firstterm>, is implemented

Is it indeed "commonly known" by that name? My web searching did not turn up
any independent use of the term.

In any event, this should be the only place where we mention multiple names
for the feature. Pick one preferred term and use it for all other mentions.
The documentation, comments and messages currently have a mix of "each foreign
key", "EACH foreign key" and "foreign key array".

> + <para>
> + Even though the most common use case for foreign key arrays
> + is on a single column key, you can define an <quote>each foreign
> + key constraint</quote> on a group of columns. As the following
> + contrived example shows, it needs to be written in table constraint form:
> + <programlisting>
> + CREATE TABLE t1 (
> + a integer PRIMARY KEY,
> + b integer,
> + c integer[],
> + <emphasis>FOREIGN KEY (b, EACH c) REFERENCES other_table (c1, c2)</emphasis>
> + );
> + </programlisting>

Rather than merely showing an abstract syntax example, some words about the
semantics are in order. You might have a parent table with primary key
(category, value). Then your child table(s) have a category column and an
array of values. The constraint then ensures that all the entries in the
array exist in the parent table *and* belong to the proper category.

> *** a/src/backend/commands/tablecmds.c
> --- b/src/backend/commands/tablecmds.c

> ***************
> *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> --- 5759,5807 ----
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("number of referencing and referenced columns for foreign key disagree")));
>
> + /* Enforce each foreign key restrictions */
> + if (fkconstraint->fk_is_each)
> + {
> + /*
> + * ON UPDATE CASCADE action is not supported on EACH foreign keys
> + */
> + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("ON UPDATE CASCADE action is not supported on "
> + "EACH foreign keys"),
> + errhint("Use ON UPDATE EACH CASCADE, instead")));

Note from the Error Message Style Guide that errhint and errdetail messages
shall be complete sentences.

> +
> + /*
> + * EACH CASCADE and EACH SET NULL actions are only available
> + * in single-column EACH foreign keys
> + */
> + if (numpks > 1 &&
> + (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL))
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("EACH CASCADE and EACH SET NULL actions are only "
> + "available on single-column EACH foreign keys")));
> + }
> + else
> + {
> + /*
> + * EACH CASCADE and EACH SET NULL actions are only available
> + * in EACH foreign keys
> + */
> + if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> + || fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> + || fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("EACH CASCADE and EACH SET NULL actions are only "
> + "available on EACH foreign keys")));

I would just reuse the previous error message. It will be fine, maybe even
better, for everyone to see the phrase "single-column".

> ***************
> *** 5812,5823 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> Oid target_typeids[2];
>
> input_typeids[0] = pktype;
> ! input_typeids[1] = fktype;
> target_typeids[0] = opcintype;
> target_typeids[1] = opcintype;
> if (can_coerce_type(2, input_typeids, target_typeids,
> COERCION_IMPLICIT))
> ! pfeqop = ffeqop = ppeqop;
> }
>
> if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
> --- 5905,5925 ----
> Oid target_typeids[2];
>
> input_typeids[0] = pktype;
> ! input_typeids[1] = fktyped;

By sending the base type instead of the domain type to can_coerce_type(), this
changes behavior ever so slightly for ordinary foreign keys. It would take
something fairly baroque to get a different result. Perhaps a polymorphic
opcintype and an FK column of a domain over an enum. Still, please don't
change that part of the logic.

> target_typeids[0] = opcintype;
> target_typeids[1] = opcintype;
> if (can_coerce_type(2, input_typeids, target_typeids,
> COERCION_IMPLICIT))
> ! {
> ! pfeqop = ppeqop;
> !
> ! /*
> ! * In caso of an EACH FK the ffeqop must be left untouched

Typo.

> *** a/src/backend/parser/gram.y
> --- b/src/backend/parser/gram.y

> ***************
> *** 2900,2917 **** ConstraintElem:
> yyscanner);
> $$ = (Node *)n;
> }
> ! | FOREIGN KEY '(' columnList ')' REFERENCES qualified_name
> opt_column_list key_match key_actions ConstraintAttributeSpec
> {
> Constraint *n = makeNode(Constraint);
> n->contype = CONSTR_FOREIGN;
> n->location = @1;
> n->pktable = $7;
> ! n->fk_attrs = $4;
> n->pk_attrs = $8;
> n->fk_matchtype = $9;
> n->fk_upd_action = (char) ($10 >> 8);
> n->fk_del_action = (char) ($10 & 0xFF);
> processCASbits($11, @11, "FOREIGN KEY",
> &n->deferrable, &n->initdeferred,
> &n->skip_validation,
> --- 2918,2959 ----
> yyscanner);
> $$ = (Node *)n;
> }
> ! | FOREIGN KEY '(' foreignKeyColumnList ')' REFERENCES qualified_name
> opt_column_list key_match key_actions ConstraintAttributeSpec
> {
> Constraint *n = makeNode(Constraint);
> n->contype = CONSTR_FOREIGN;
> n->location = @1;
> n->pktable = $7;
> ! n->fk_attrs = $4;
> n->pk_attrs = $8;
> n->fk_matchtype = $9;
> n->fk_upd_action = (char) ($10 >> 8);
> n->fk_del_action = (char) ($10 & 0xFF);
> +
> + /*
> + * Split the content of foreignKeyColumnList
> + * in two separate list. One lis of fileds

Two typos.

> + * and one list of boolean values.
> + */
> + {
> + ListCell *i;
> +
> + n->fk_attrs = NIL;
> + n->fk_is_each = false;
> + n->fk_each_attrs = NIL;
> + foreach (i, $4)
> + {
> + ForeignKeyColumnElem *elem =
> + (ForeignKeyColumnElem *)lfirst(i);
> +
> + n->fk_attrs = lappend(n->fk_attrs, elem->name);
> + n->fk_is_each |= elem->each;
> + n->fk_each_attrs = lappend(n->fk_each_attrs,
> + makeString(elem->each?"t":"f"));
> + }
> + }

This processing should happen in parse_utilcmd.c rather than gram.y. Magic
values "t" and "f" won't do. Make fk_attrs a list of ForeignKeyColumnElem or
else at least use an int list of true/false. Either way, you would no longer
need the decode loop in tablecmds.c.

> *** a/src/backend/utils/adt/arrayfuncs.c
> --- b/src/backend/utils/adt/arrayfuncs.c

> + Datum
> + array_replace(PG_FUNCTION_ARGS)
> + {

> + for (i = 0; i < nitems; i++)
> + {
> + bool isNull;
> + bool oprresult;
> +
> + /* Get source element, checking for NULL */
> + if (bitmap && (*bitmap & bitmask) == 0)
> + {
> + if (old_value_isnull)
> + {
> + values[i] = new_value;
> + isNull = false;
> + changed = true;
> + }
> + else
> + isNull = true;
> + }

This does the wrong thing if old_value_isnull == new_value_isnull == true:

[local] regression=# select array_replace('{4,null,5}'::int[], null, null);
array_replace
---------------
{4,0,5}

(With text[], that yields a crash.)

> *** a/src/backend/utils/adt/ri_triggers.c
> --- b/src/backend/utils/adt/ri_triggers.c

> ! Datum
> ! RI_FKey_eachcascade_upd(PG_FUNCTION_ARGS)
> {

> ! /* ----------
> ! * The query string built is
> ! * UPDATE ONLY <fktable>
> ! * SET fkatt1 = array_replace(fkatt1, $n, $1) [, ...]
> ! * WHERE $n = fkatt1 [AND ...]
> ! * The type id's for the $ parameters are those of the
> ! * corresponding PK attributes.
> ! * ----------
> ! */
> ! initStringInfo(&querybuf);
> ! initStringInfo(&qualbuf);
> ! quoteRelationName(fkrelname, fk_rel);
> ! appendStringInfo(&querybuf, "UPDATE ONLY %s SET", fkrelname);
> ! querysep = "";
> ! qualsep = "WHERE";

An elog(ERROR) when riinfo.nkeys != 1 would make sense here and for other
functions where we forbid multiple columns at creation time. If nothing else,
this needs a comment that the looping is vestigial and the function only
supports single-column constraints.

> ***************
> *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> * The query string built is:
> * SELECT fk.keycols FROM ONLY relname fk
> * LEFT OUTER JOIN ONLY pkrelname pk
> ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> * WHERE pk.pkkeycol1 IS NULL AND
> * For MATCH unspecified:
> * (fk.keycol1 IS NOT NULL [AND ...])
> * For MATCH FULL:
> * (fk.keycol1 IS NOT NULL [OR ...])
> *
> * We attach COLLATE clauses to the operators when comparing columns
> * that have different collations.
> *----------
> */
> initStringInfo(&querybuf);
> appendStringInfo(&querybuf, "SELECT ");
> sep = "";
> for (i = 0; i < riinfo.nkeys; i++)
> --- 3527,3557 ----
> * The query string built is:
> * SELECT fk.keycols FROM ONLY relname fk
> * LEFT OUTER JOIN ONLY pkrelname pk
> ! * ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> ! * [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> * WHERE pk.pkkeycol1 IS NULL AND
> * For MATCH unspecified:
> * (fk.keycol1 IS NOT NULL [AND ...])
> * For MATCH FULL:
> * (fk.keycol1 IS NOT NULL [OR ...])
> *
> + * In case of an EACH foreign key, a recheck subquery is added to
> + * the join condition in order to check that every combination of keys
> + * is actually referenced.
> + * The RECHECK_SUBQUERY is
> + * SELECT 1 FROM
> + * unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> + * LEFT OUTER JOIN ONLY pkrelname pk
> + * ON (pk.pkkeycol1=x1.x1 [AND ...])
> + * WHERE pk.pkkeycol1 IS NULL AND
> + * (fk.keycol1 IS NOT NULL [AND ...])

What is a worst-case query plan for a constraint with n ordinary keys and m
EACH keys?

> + *
> * We attach COLLATE clauses to the operators when comparing columns
> * that have different collations.
> *----------
> */

> *** a/src/include/nodes/parsenodes.h
> --- b/src/include/nodes/parsenodes.h
> ***************
> *** 570,575 **** typedef struct DefElem
> --- 570,589 ----
> } DefElem;
>
> /*
> + * ForeignKeyColumnElem - foreign key column (used in foreign key constaint)

Typo.

> + *
> + * For a foreign key attribute, 'name' is the name of the table column to
> + * index, and each is true if it is an EACH fk.
> + */
> + typedef struct ForeignKeyColumnElem
> + {
> + NodeTag type;
> + Node *name; /* name of the column, or NULL */
> + bool each; /* true if an EACH foreign key */
> +
> + } ForeignKeyColumnElem;
> +
> + /*
> * LockingClause - raw representation of FOR UPDATE/SHARE options
> *
> * Note: lockedRels == NIL means "all relations in query". Otherwise it
> ***************
> *** 1510,1515 **** typedef enum ConstrType /* types of constraints */
> --- 1524,1531 ----
> #define FKCONSTR_ACTION_CASCADE 'c'
> #define FKCONSTR_ACTION_SETNULL 'n'
> #define FKCONSTR_ACTION_SETDEFAULT 'd'
> + #define FKCONSTR_ACTION_ARRCASCADE 'C'
> + #define FKCONSTR_ACTION_ARRSETNULL 'N'

These names need update for the array -> each terminology change.

I consider these the core changes needed to reach Ready for Committer:

- Fix crash in array_replace(arr, null, null).
- Don't look through the domain before calling can_coerce_type().
- Compare operator family of pfeqop and TYPECACHE_EQ_OPR at creation.
- Move post-processing from gram.y and remove "t"/"f" magic values.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2012-02-25 02:31:50 Re: Runtime SHAREDIR for testing CREATE EXTENSION
Previous Message Vik Reykja 2012-02-25 01:06:49 Re: foreign key locks, 2nd attempt