Skip site navigation (1) Skip section navigation (2)

Re: [PATCH] Support for foreign keys with arrays

From: Marco Nenciarini <marco(dot)nenciarini(at)2ndQuadrant(dot)it>
To: pgsql-hackers(at)postgresql(dot)org
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2012-03-14 18:03:08
Message-ID: 1331748188.6744.15.camel@greygoo.devise-it.lan (view raw or flat)
Thread:
Lists: pgsql-hackers
Hi,

please find attached v4 of the EACH Foreign Key patch (formerly known
also as Foreign Key Array).

All the open issues have been fixed, including the crucial ones; see
below for details.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco(dot)nenciarini(at)2ndQuadrant(dot)it | www.2ndQuadrant.it 

On Fri, Feb 24, 2012 at 09:01:35PM -0500, Noah Misch wrote:
> You support, and have test cases for, constraints with multiple EACH columns.
> The documentation and comments do not explain their semantics.  On reviewing
> the behavior and code, you have implemented it in terms of a Cartesian
> product.  That's logical, but when is such a constraint actually useful?  If
> someone can think of an application, great; let's document his example.
> Otherwise, let's reject such constraints at DDL time.

Now multiple EACH column are rejected.

> 
> > We can use multi-dimensional arrays as well as referencing columns. In 
> > that case though, ON DELETE EACH CASCADE will behave like ON DELETE EACH
> > SET NULL. This is a safe way of implementing the action. 
> > We have some ideas on how to implement this, but we feel it is better to
> > limit the behaviour for now.
> 
> This still feels awfully unprincipled to me.  How about just throwing an error
> when we need to remove an element from a multidimensional array?  Essentially,
> have ON DELETE EACH CASCADE downgrade to ON DELETE RESTRICT when it encounters
> a multidimensional array.  That's strictly less code than what you have now,
> and it keeps our options open.  We can always change from error to set-null
> later, but we can't so readily change from set-null to anything else.

That seems reasonable to me. Implemented and documented.

> As I pondered this patch again, I came upon formal hazards around non-default
> operator classes.  Today, ATAddForeignKeyConstraint() always chooses pfeqop,
> ffeqop and ppeqop in the same operator family.  If it neglected this, we would
> get wrong behavior when one of the operators is sensitive to a change to which
> another is insensitive.  For EACH FK constraints, this patch sets ffeqop =
> ARRAY_EQ_OP unconditionally.  array_eq() uses the TYPECACHE_EQ_OPR (usually
> from the default B-tree operator class).  That operator may not exist at all,
> let alone share an operator family with pfeqop.  Shall we deal with this by
> retrieving TYPECACHE_EQ_OPR in ATAddForeignKeyConstraint() and rejecting the
> constraint creation if it does not share an operator family with pfeqop?  The
> same hazard affects ri_triggers.c use of array_remove() and array_replace(),
> and the same change mitigates it.

Check added. As a consequence of this stricter policy, certain
previously allowed cases are now unacceptable (e.g pk FLOAT and fk
INT[]).
Regression tests have been added.

> Let me say again how much I like the test coverage from this patch.  It's
> great to think of something worth verifying and find existing coverage for it
> in the submitted test cases.
> 
> 
> > *** a/doc/src/sgml/catalogs.sgml
> > --- b/doc/src/sgml/catalogs.sgml
> 
> > ***************
> > *** 2102,2107 ****
> > --- 2106,2118 ----
> >         <entry></entry>
> >         <entry>If a check constraint, a human-readable representation of the expression</entry>
> >        </row>
> > + 
> > +      <row>
> > +       <entry><structfield>confisarray</structfield></entry>
> 
> Name is now confiseach.

Fixed.

> pg_constraint.confeach needs documentation.

Most of pg_constraint columns, including all the boolean ones, are
documented only in the "description" column of

http://www.postgresql.org/docs/9.1/static/catalog-pg-constraint.html#AEN85760

it seems that confiseach should not be an exception, since it just
indicates whether the constraint is of a certain kind or not.

> 
> > +       <entry><type>bool</type></entry>
> > +       <entry></entry>
> > +       <entry>If a foreign key, true if a EACH REFERENCE foreign key</entry>
> > +      </row>
> >       </tbody>
> >      </tgroup>
> >     </table>
> > *** a/doc/src/sgml/ddl.sgml
> > --- b/doc/src/sgml/ddl.sgml
> 
> > +    <para>
> > +     Another option you have with foreign keys is to use a
> > +     referencing column which is an array of elements with
> > +     the same type (or a compatible one) as the referenced
> > +     column in the related table. This feature, commonly known
> > +     as <firstterm>foreign key arrays</firstterm>, is implemented
> 
> Is it indeed "commonly known" by that name?  My web searching did not turn up
> any independent use of the term.
> 
> In any event, this should be the only place where we mention multiple names
> for the feature.  Pick one preferred term and use it for all other mentions.
> The documentation, comments and messages currently have a mix of "each foreign
> key", "EACH foreign key" and "foreign key array".

In the end we uniformly adopted "EACH foreign key" as the unique name
of this feature; documentation, comments and messages have been
updated accordingly.

> > +    <para>
> > +     Even though the most common use case for foreign key arrays
> > +     is on a single column key, you can define an <quote>each foreign
> > +     key constraint</quote> on a group of columns. As the following
> > +     contrived example shows, it needs to be written in table constraint form:
> > + <programlisting>
> > + CREATE TABLE t1 (
> > +   a integer PRIMARY KEY,
> > +   b integer,
> > +   c integer[],
> > +   <emphasis>FOREIGN KEY (b, EACH c) REFERENCES other_table (c1, c2)</emphasis>
> > + );
> > + </programlisting>
> 
> Rather than merely showing an abstract syntax example, some words about the
> semantics are in order.  You might have a parent table with primary key
> (category, value).  Then your child table(s) have a category column and an
> array of values.  The constraint then ensures that all the entries in the
> array exist in the parent table *and* belong to the proper category.

That example has been replaced by a concrete one.

> > *** a/src/backend/commands/tablecmds.c
> > --- b/src/backend/commands/tablecmds.c
> 
> > ***************
> > *** 5736,5741 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> > --- 5759,5807 ----
> >   				(errcode(ERRCODE_INVALID_FOREIGN_KEY),
> >   				 errmsg("number of referencing and referenced columns for foreign key disagree")));
> >   
> > + 	/* Enforce each foreign key restrictions */
> > + 	if (fkconstraint->fk_is_each)
> > + 	{
> > + 		/*
> > + 		 * ON UPDATE CASCADE action is not supported on EACH foreign keys
> > + 		 */
> > + 		if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_CASCADE)
> > + 			ereport(ERROR,
> > + 				(errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + 				errmsg("ON UPDATE CASCADE action is not supported on "
> > + 					   "EACH foreign keys"),
> > + 				errhint("Use ON UPDATE EACH CASCADE, instead")));
> 
> Note from the Error Message Style Guide that errhint and errdetail messages
> shall be complete sentences.

Fixed.

> > + 
> > + 		/*
> > + 		 * EACH CASCADE and EACH SET NULL actions are only available
> > + 		 * in single-column EACH foreign keys
> > + 		 */
> > + 		if (numpks > 1 &&
> > + 			(fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > + 			|| fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > + 			|| fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > + 			|| fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL))
> > + 			ereport(ERROR,
> > + 				(errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + 				errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > + 						"available on single-column EACH foreign keys")));
> > + 	}
> > + 	else
> > + 	{
> > + 		/*
> > + 		 * EACH CASCADE and EACH SET NULL actions are only available
> > + 		 * in EACH foreign keys
> > + 		 */
> > + 		if (fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRCASCADE
> > + 			|| fkconstraint->fk_upd_action == FKCONSTR_ACTION_ARRSETNULL
> > + 			|| fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRCASCADE
> > + 			|| fkconstraint->fk_del_action == FKCONSTR_ACTION_ARRSETNULL)
> > + 			ereport(ERROR,
> > + 				(errcode(ERRCODE_INVALID_FOREIGN_KEY),
> > + 				errmsg("EACH CASCADE and EACH SET NULL actions are only "
> > + 					   "available on EACH foreign keys")));
> 
> I would just reuse the previous error message.  It will be fine, maybe even
> better, for everyone to see the phrase "single-column".

Fixed.

> > ***************
> > *** 5812,5823 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> >   			Oid			target_typeids[2];
> >   
> >   			input_typeids[0] = pktype;
> > ! 			input_typeids[1] = fktype;
> >   			target_typeids[0] = opcintype;
> >   			target_typeids[1] = opcintype;
> >   			if (can_coerce_type(2, input_typeids, target_typeids,
> >   								COERCION_IMPLICIT))
> > ! 				pfeqop = ffeqop = ppeqop;
> >   		}
> >   
> >   		if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
> > --- 5905,5925 ----
> >   			Oid			target_typeids[2];
> >   
> >   			input_typeids[0] = pktype;
> > ! 			input_typeids[1] = fktyped;
> 
> By sending the base type instead of the domain type to can_coerce_type(), this
> changes behavior ever so slightly for ordinary foreign keys.  It would take
> something fairly baroque to get a different result.  Perhaps a polymorphic
> opcintype and an FK column of a domain over an enum.  Still, please don't
> change that part of the logic.

Fixed.

> >   			target_typeids[0] = opcintype;
> >   			target_typeids[1] = opcintype;
> >   			if (can_coerce_type(2, input_typeids, target_typeids,
> >   								COERCION_IMPLICIT))
> > ! 			{
> > ! 				pfeqop = ppeqop;
> > ! 
> > ! 				/*
> > ! 				 * In caso of an EACH FK the ffeqop must be left untouched
> 
> Typo.

Fixed.

> 
> > *** a/src/backend/parser/gram.y
> > --- b/src/backend/parser/gram.y
> 
> > ***************
> > *** 2900,2917 **** ConstraintElem:
> >   								   yyscanner);
> >   					$$ = (Node *)n;
> >   				}
> > ! 			| FOREIGN KEY '(' columnList ')' REFERENCES qualified_name
> >   				opt_column_list key_match key_actions ConstraintAttributeSpec
> >   				{
> >   					Constraint *n = makeNode(Constraint);
> >   					n->contype = CONSTR_FOREIGN;
> >   					n->location = @1;
> >   					n->pktable			= $7;
> > ! 					n->fk_attrs			= $4;
> >   					n->pk_attrs			= $8;
> >   					n->fk_matchtype		= $9;
> >   					n->fk_upd_action	= (char) ($10 >> 8);
> >   					n->fk_del_action	= (char) ($10 & 0xFF);
> >   					processCASbits($11, @11, "FOREIGN KEY",
> >   								   &n->deferrable, &n->initdeferred,
> >   								   &n->skip_validation,
> > --- 2918,2959 ----
> >   								   yyscanner);
> >   					$$ = (Node *)n;
> >   				}
> > ! 			| FOREIGN KEY '(' foreignKeyColumnList ')' REFERENCES qualified_name
> >   				opt_column_list key_match key_actions ConstraintAttributeSpec
> >   				{
> >   					Constraint *n = makeNode(Constraint);
> >   					n->contype = CONSTR_FOREIGN;
> >   					n->location = @1;
> >   					n->pktable			= $7;
> > ! 					n->fk_attrs 		= $4;
> >   					n->pk_attrs			= $8;
> >   					n->fk_matchtype		= $9;
> >   					n->fk_upd_action	= (char) ($10 >> 8);
> >   					n->fk_del_action	= (char) ($10 & 0xFF);
> > + 
> > + 					/*
> > + 					 * Split the content of foreignKeyColumnList
> > + 					 * in two separate list. One lis of fileds
> 
> Two typos.

Fixed.

> > + 					 * and one list of boolean values.
> > + 					 */
> > + 					{
> > + 						ListCell *i;
> > + 
> > + 						n->fk_attrs = NIL;
> > + 						n->fk_is_each = false;
> > + 						n->fk_each_attrs = NIL;
> > + 						foreach (i, $4)
> > + 						{
> > + 							ForeignKeyColumnElem *elem =
> > + 								(ForeignKeyColumnElem *)lfirst(i);
> > + 
> > + 							n->fk_attrs = lappend(n->fk_attrs, elem->name);
> > + 							n->fk_is_each |= elem->each;
> > + 							n->fk_each_attrs = lappend(n->fk_each_attrs,
> > + 													   makeString(elem->each?"t":"f"));
> > + 						}
> > + 					}
> 
> This processing should happen in parse_utilcmd.c rather than gram.y.  Magic
> values "t" and "f" won't do.  Make fk_attrs a list of ForeignKeyColumnElem or
> else at least use an int list of true/false.  Either way, you would no longer
> need the decode loop in tablecmds.c.

Fixed. We have removed the magic values and moved decoding of the
ForeignKeyColumnElem list inside parse_utilcmd.c.

> > *** a/src/backend/utils/adt/arrayfuncs.c
> > --- b/src/backend/utils/adt/arrayfuncs.c
> 
> > + Datum
> > + array_replace(PG_FUNCTION_ARGS)
> > + {
> 
> > + 	for (i = 0; i < nitems; i++)
> > + 	{
> > + 		bool		isNull;
> > + 		bool		oprresult;
> > + 
> > + 		/* Get source element, checking for NULL */
> > + 		if (bitmap && (*bitmap & bitmask) == 0)
> > + 		{
> > + 			if (old_value_isnull)
> > + 			{
> > + 				values[i] = new_value;
> > + 				isNull = false;
> > + 				changed = true;
> > + 			}
> > + 			else
> > + 				isNull = true;
> > + 		}
> 
> This does the wrong thing if old_value_isnull == new_value_isnull == true:
> 
> [local] regression=# select array_replace('{4,null,5}'::int[], null, null);
>  array_replace 
> ---------------
>  {4,0,5}
> 
> (With text[], that yields a crash.)

Fixed.

> > *** a/src/backend/utils/adt/ri_triggers.c
> > --- b/src/backend/utils/adt/ri_triggers.c
> 
> > ! Datum
> > ! RI_FKey_eachcascade_upd(PG_FUNCTION_ARGS)
> >   {
> 
> > ! 				/* ----------
> > ! 				 * The query string built is
> > ! 				 *	UPDATE ONLY <fktable>
> > ! 				 *		SET	fkatt1 = array_replace(fkatt1, $n, $1) [, ...]
> > ! 				 *		WHERE $n = fkatt1 [AND ...]
> > ! 				 * The type id's for the $ parameters are those of the
> > ! 				 * corresponding PK attributes.
> > ! 				 * ----------
> > ! 				 */
> > ! 				initStringInfo(&querybuf);
> > ! 				initStringInfo(&qualbuf);
> > ! 				quoteRelationName(fkrelname, fk_rel);
> > ! 				appendStringInfo(&querybuf, "UPDATE ONLY %s SET", fkrelname);
> > ! 				querysep = "";
> > ! 				qualsep = "WHERE";
> 
> An elog(ERROR) when riinfo.nkeys != 1 would make sense here and for other
> functions where we forbid multiple columns at creation time.  If nothing else,
> this needs a comment that the looping is vestigial and the function only
> supports single-column constraints.

Logic implemented and comment added.

> > ***************
> > *** 2678,2695 **** RI_Initial_Check(Trigger *trigger, Relation fk_rel, Relation pk_rel)
> >   	 * The query string built is:
> >   	 *	SELECT fk.keycols FROM ONLY relname fk
> >   	 *	 LEFT OUTER JOIN ONLY pkrelname pk
> > ! 	 *	 ON (pk.pkkeycol1=fk.keycol1 [AND ...])
> >   	 *	 WHERE pk.pkkeycol1 IS NULL AND
> >   	 * For MATCH unspecified:
> >   	 *	 (fk.keycol1 IS NOT NULL [AND ...])
> >   	 * For MATCH FULL:
> >   	 *	 (fk.keycol1 IS NOT NULL [OR ...])
> >   	 *
> >   	 * We attach COLLATE clauses to the operators when comparing columns
> >   	 * that have different collations.
> >   	 *----------
> >   	 */
> >   	initStringInfo(&querybuf);
> >   	appendStringInfo(&querybuf, "SELECT ");
> >   	sep = "";
> >   	for (i = 0; i < riinfo.nkeys; i++)
> > --- 3527,3557 ----
> >   	 * The query string built is:
> >   	 *	SELECT fk.keycols FROM ONLY relname fk
> >   	 *	 LEFT OUTER JOIN ONLY pkrelname pk
> > ! 	 *	 ON (pk.pkkeycol1=fk.keycol1 [AND ...]
> > ! 	 *	  [AND NOT EXISTS(<RECHECK_SUBQUERY>)])
> >   	 *	 WHERE pk.pkkeycol1 IS NULL AND
> >   	 * For MATCH unspecified:
> >   	 *	 (fk.keycol1 IS NOT NULL [AND ...])
> >   	 * For MATCH FULL:
> >   	 *	 (fk.keycol1 IS NOT NULL [OR ...])
> >   	 *
> > + 	 * In case of an EACH foreign key, a recheck subquery is added to
> > + 	 * the join condition in order to check that every combination of keys
> > + 	 * is actually referenced.
> > + 	 * The RECHECK_SUBQUERY is
> > + 	 *   SELECT 1 FROM
> > + 	 *     unnest(fk.keycol1) x1(x1) [CROSS JOIN ...]
> > + 	 *   LEFT OUTER JOIN ONLY pkrelname pk
> > + 	 *   ON (pk.pkkeycol1=x1.x1 [AND ...])
> > + 	 *   WHERE pk.pkkeycol1 IS NULL AND
> > + 	 *     (fk.keycol1 IS NOT NULL [AND ...])
> 
> What is a worst-case query plan for a constraint with n ordinary keys and m
> EACH keys?

The subquery complexity can be compared with the one of the main query.
Anyway, it is a one-off cost, paid at constraint creation, which I
believe is acceptable. The fact that we now implement only single-column
EACH foreign keys might suggest that we postpone this discussion at a
later time.

> 
> > + 	 *
> >   	 * We attach COLLATE clauses to the operators when comparing columns
> >   	 * that have different collations.
> >   	 *----------
> >   	 */
> 
> > *** a/src/include/nodes/parsenodes.h
> > --- b/src/include/nodes/parsenodes.h
> > ***************
> > *** 570,575 **** typedef struct DefElem
> > --- 570,589 ----
> >   } DefElem;
> >   
> >   /*
> > +  * ForeignKeyColumnElem - foreign key column (used in foreign key constaint)
> 
> Typo.

Fixed.

> 
> > +  *
> > +  * For a foreign key attribute, 'name' is the name of the table column to
> > +  * index, and each is true if it is an EACH fk.
> > +  */
> > + typedef struct ForeignKeyColumnElem
> > + {
> > + 	NodeTag		type;
> > + 	Node	   *name;			/* name of the column, or NULL */
> > + 	bool		each;			/* true if an EACH foreign key */
> > + 
> > + } ForeignKeyColumnElem;
> > + 
> > + /*
> >    * LockingClause - raw representation of FOR UPDATE/SHARE options
> >    *
> >    * Note: lockedRels == NIL means "all relations in query".	Otherwise it
> > ***************
> > *** 1510,1515 **** typedef enum ConstrType			/* types of constraints */
> > --- 1524,1531 ----
> >   #define FKCONSTR_ACTION_CASCADE		'c'
> >   #define FKCONSTR_ACTION_SETNULL		'n'
> >   #define FKCONSTR_ACTION_SETDEFAULT	'd'
> > + #define FKCONSTR_ACTION_ARRCASCADE	'C'
> > + #define FKCONSTR_ACTION_ARRSETNULL	'N'
> 
> These names need update for the array -> each terminology change.

Fixed.

> I consider these the core changes needed to reach Ready for Committer:
> 
> - Fix crash in array_replace(arr, null, null).
> - Don't look through the domain before calling can_coerce_type().
> - Compare operator family of pfeqop and TYPECACHE_EQ_OPR at creation.
> - Move post-processing from gram.y and remove "t"/"f" magic values.

All done.



Attachment: EACH-foreign-key.v4.patch.bz2
Description: application/x-bzip (28.9 KB)

In response to

Responses

pgsql-hackers by date

Next:From: Daniel FarinaDate: 2012-03-14 18:06:16
Subject: Faster compression, again
Previous:From: Robert HaasDate: 2012-03-14 18:02:03
Subject: Re: patch for parallel pg_dump

Privacy Policy | About PostgreSQL
Copyright © 1996-2014 The PostgreSQL Global Development Group