Re: [PATCH] Support for foreign keys with arrays

From: Noah Misch <noah(at)leadboat(dot)com>
To: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndQuadrant(dot)it>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2011-11-17 04:28:41
Message-ID: 20111117042841.GA20517@tornado.leadboat.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Gabriele,

On Fri, Nov 04, 2011 at 01:48:02PM +0100, Gabriele Bartolini wrote:
> CREATE TABLE pt (
> id INTEGER PRIMARY KEY,
> ...
> );
>
> CREATE TABLE ft (
> id SERIAL PRIMARY KEY,
> pids INTEGER[] REFERENCES pt,
> ...
> );

This seems useful.

I'm assuming the SQL spec says nothing about a feature like this?

> This patch is for discussion and has been built against HEAD.
> It compiles and passes all regressions tests (including specific ones -
> see the src/test/regress/sql/foreign_key.sql file).
> Empty arrays, multi-dimensional arrays, duplicate elements and NULL
> values are allowed.

With this patch, RI_Initial_Check does not detect a violation in an array that
contains at least one conforming element:

BEGIN;
CREATE TABLE parent (c int PRIMARY KEY);
CREATE TABLE child (c int[]);
INSERT INTO parent VALUES (1);
INSERT INTO child VALUES ('{3,1,2}');
ALTER TABLE child ADD FOREIGN KEY (c) REFERENCES parent; -- should error
INSERT INTO child VALUES ('{3,1,2}'); -- does error, as expected
ROLLBACK;

The error message DETAIL on constraint violation would benefit from
array-FK-specific language. Example of current message:

ERROR: insert or update on table "child" violates foreign key constraint "child_c_fkey"
DETAIL: Key (c)=({3,1,2}) is not present in table "parent".

The patch is missing a change to the code that does FK=FK checks when a user
updates the FK side:

\set VERBOSITY verbose
BEGIN;
CREATE TABLE parent (c int PRIMARY KEY);
CREATE TABLE child (c int[] REFERENCES parent);
INSERT INTO parent VALUES (1);
INSERT INTO child VALUES ('{1,1}');
COMMIT;
-- ERROR: XX000: no conversion function from integer[] to integer
-- LOCATION: ri_HashCompareOp, ri_triggers.c:4097
UPDATE child SET c = '{1,1}';
DROP TABLE parent, child;
COMMIT;

Please audit each ri_triggers.c entry point for further problems like this.

> We had to enforce some limitations, due to the lack (yet) of a clear and
> universally accepted behaviour and strategy.
> For example, consider the ON DELETE action on the above tables: in case
> of delete of a record in the 'pt' table, should we remove the whole row
> or just the values from the array?
> We hope we can start a discussion from here.

Removing values from the array seems best to me. There's no doubt about what
ON UPDATE CASCADE should do, and having ON DELETE CASCADE excise individual
array elements is consistent with that. It's less clear for SET NULL, but I'd
continue with a per-element treatment. I'd continue to forbid SET DEFAULT.

However, Jeff Davis did expect ON DELETE CASCADE to remove entire rows:
http://archives.postgresql.org/message-id/1288119207.15279.24.camel@jdavis-ux.asterdata.local
So, perhaps the behavior needs to be user-selectable.

> Current limitations:
>
> * Only arrays of the same type as the primary key in the referenced
> table are supported

This is understandable for a WIP, but the final patch will need to use our
existing, looser foreign key type match requirement.

> * multi-column foreign keys are not supported (only single column)

Any particular reason for this?

> *** a/doc/src/sgml/ddl.sgml
> --- b/doc/src/sgml/ddl.sgml
> ***************
> *** 764,769 **** CREATE TABLE order_items (
> --- 764,796 ----
> the last table.
> </para>
>
> + <para>
> + Another option you have with foreign keys is to use a referencing column
> + which is an array of elements with the same type as the referenced column
> + in the related table. This feature, also known as <firstterm>foreign key arrays</firstterm>,
> + is described in the following example:

Please wrap your documentation paragraphs.

> *** a/src/backend/commands/tablecmds.c
> --- b/src/backend/commands/tablecmds.c
> ***************
> *** 5705,5710 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> --- 5705,5735 ----
> Oid ffeqop;
> int16 eqstrategy;
>
> + /* Check if foreign key is an array of primary key types */
> + const bool is_foreign_key_array = (fktype == get_array_type (pktype));

We don't declare non-pointer, local variables "const". Also, [not positive on
this one] when an initial assignment requires a comment, declare the variable
with no assignment and no comment. Then, assign it later with the comment.
This keeps the per-block declarations packed together.

This test wrongly rejects FK types that are domains over the array type:

BEGIN;
CREATE TABLE parent (c int PRIMARY KEY);
CREATE DOMAIN intarrdom AS int[];
CREATE TABLE child (c intarrdom REFERENCES parent);
ROLLBACK;

> +
> + /* Enforce foreign key array restrictions */
> + if (is_foreign_key_array)
> + {
> + /*
> + * Foreign key array must not be part of a multi-column foreign key
> + */
> + if (is_foreign_key_array && numpks > 1)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("foreign key arrays must not be part of a multi-column foreign key")));
> +
> + /*
> + * We have to restrict foreign key array to NO ACTION and RESTRICT mode
> + * until the behaviour triggered by the other actions is clearer and well defined
> + */
> + if ((fkconstraint->fk_upd_action != FKCONSTR_ACTION_NOACTION && fkconstraint->fk_upd_action != FKCONSTR_ACTION_RESTRICT)
> + || (fkconstraint->fk_del_action != FKCONSTR_ACTION_NOACTION && fkconstraint->fk_del_action != FKCONSTR_ACTION_RESTRICT))

Break these lines to keep things within 78 columns. Audit the remainder of
your changes for long lines, and break when in doubt.

> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("NO ACTION and RESTRICT are the only supported actions for foreign key arrays")));

Error message constants can remain unbroken, though.

> + }
> +
> /* We need several fields out of the pg_opclass entry */
> cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclasses[i]));
> if (!HeapTupleIsValid(cla_ht))
> ***************
> *** 5766,5772 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
> Oid target_typeids[2];
>
> input_typeids[0] = pktype;
> ! input_typeids[1] = fktype;
> target_typeids[0] = opcintype;
> target_typeids[1] = opcintype;
> if (can_coerce_type(2, input_typeids, target_typeids,
> --- 5791,5801 ----
> Oid target_typeids[2];
>
> input_typeids[0] = pktype;
> ! /* When is FKA we must use for FK the same type of PK */
> ! if (is_foreign_key_array)
> ! input_typeids[1] = pktype;
> ! else
> ! input_typeids[1] = fktype;
> target_typeids[0] = opcintype;
> target_typeids[1] = opcintype;
> if (can_coerce_type(2, input_typeids, target_typeids,

This is bogus; the can_coerce_type test will always pass (excluding bad cases
of catalog inconsistency).

ATAddForeignKeyConstraint should choose to make an array foreign key whenever
the PK side is a scalar and the FK side is an array. Then, grab the element
type of the FK side and feed that through the operator-identification logic.

> *** a/src/backend/utils/adt/ri_triggers.c
> --- b/src/backend/utils/adt/ri_triggers.c
> ***************
> *** 460,465 **** RI_FKey_check(PG_FUNCTION_ARGS)
> --- 460,466 ----
> char paramname[16];
> const char *querysep;
> Oid queryoids[RI_MAX_NUMKEYS];
> + bool is_foreign_key_array = false;
>
> /* ----------
> * The query string built is
> ***************
> *** 476,493 **** RI_FKey_check(PG_FUNCTION_ARGS)
> {
> Oid pk_type = RIAttType(pk_rel, riinfo.pk_attnums[i]);
> Oid fk_type = RIAttType(fk_rel, riinfo.fk_attnums[i]);
>
> quoteOneName(attname,
> RIAttName(pk_rel, riinfo.pk_attnums[i]));
> sprintf(paramname, "$%d", i + 1);
> ! ri_GenerateQual(&querybuf, querysep,
> ! attname, pk_type,
> ! riinfo.pf_eq_oprs[i],
> ! paramname, fk_type);
> querysep = "AND";
> queryoids[i] = fk_type;
> }
> ! appendStringInfo(&querybuf, " FOR SHARE OF x");
>
> /* Prepare and save the plan */
> qplan = ri_PlanCheck(querybuf.data, riinfo.nkeys, queryoids,
> --- 477,524 ----
> {
> Oid pk_type = RIAttType(pk_rel, riinfo.pk_attnums[i]);
> Oid fk_type = RIAttType(fk_rel, riinfo.fk_attnums[i]);
> + is_foreign_key_array = (fk_type == get_array_type (pk_type));

Drop the extra whitespace before the function argument list.

>
> quoteOneName(attname,
> RIAttName(pk_rel, riinfo.pk_attnums[i]));
> sprintf(paramname, "$%d", i + 1);
> ! /*
> ! * In case of an array foreign key, we check that every
> ! * DISTINCT NOT NULL value in the array is present in the PK table.
> ! * XXX: This works because the query is executed with LIMIT 1,

I found this comment confusing, since the SQL syntax "LIMIT 1" is never used
here. I suppose you're referring to the fact that we call into SPI with
tcount = 1?

> ! * but may not work properly with SSI (a better approach would be
> ! * to inspect the array and skip the check in case of empty arrays).

Why might serializable transactions be specially affected?

> ! */
> ! if (is_foreign_key_array)
> ! {
> ! appendStringInfo(&querybuf, " %s (SELECT count(*) FROM (SELECT DISTINCT UNNEST(%s)) y WHERE y IS NOT NULL)", querysep, paramname);
> ! appendStringInfo(&querybuf, " = (SELECT count(*) FROM (SELECT 1 FROM ONLY %s y", pkrelname);
> ! ri_GenerateQual(&querybuf, "WHERE",
> ! attname, pk_type,
> ! riinfo.pf_eq_oprs[i],
> ! paramname, fk_type);
> ! /*
> ! * We lock for share every row in the pkreltable that is
> ! * referenced by the array elements
> ! */
> ! appendStringInfo(&querybuf, " FOR SHARE OF y) z)");

The resulting query performs an irrelevant sequential scan on the PK table:

SELECT 1 FROM ONLY "public"."parent" x WHERE (SELECT count(*) FROM (SELECT DISTINCT UNNEST($1)) y WHERE y IS NOT NULL) = (SELECT count(*) FROM (SELECT 1 FROM ONLY "public"."parent" y WHERE "c" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF y) z)

As you suggested with that comment above, this scan always ends after one row.
That places a bound on the actual performance hit. However, we still read the
one row, which may mean loading a page for nothing. At a minimum, simplify
this query to:

SELECT 1 WHERE (SELECT count(*) FROM (SELECT DISTINCT UNNEST($1)) y WHERE y IS NOT NULL) = (SELECT count(*) FROM (SELECT 1 FROM ONLY "public"."parent" y WHERE "c" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF y) z)

That also naturally handles empty arrays against empty PK tables, which
currently fail for me even at READ COMMITTED:

BEGIN;
CREATE TABLE parent (c int PRIMARY KEY);
CREATE TABLE child (c int[] REFERENCES parent);
INSERT INTO child VALUES ('{}'); -- fails wrongly
ROLLBACK;

> ! }
> ! else
> ! {
> ! ri_GenerateQual(&querybuf, querysep,
> ! attname, pk_type,
> ! riinfo.pf_eq_oprs[i],
> ! paramname, fk_type);
> ! }
> querysep = "AND";
> queryoids[i] = fk_type;
> }
> ! /*
> ! * We skip locking for share in case of foreign key arrays
> ! * as it has been done in the inner subquery
> ! */
> ! if (! is_foreign_key_array)

Drop the whitespace after the "!".

> ! appendStringInfo(&querybuf, " FOR SHARE OF x");
>
> /* Prepare and save the plan */
> qplan = ri_PlanCheck(querybuf.data, riinfo.nkeys, queryoids,

> *** a/src/test/regress/expected/foreign_key.out
> --- b/src/test/regress/expected/foreign_key.out
> ***************
> *** 968,978 **** drop table pktable;
> drop table pktable_base;
> -- 2 columns (1 table), mismatched types
> create table pktable_base(base1 int not null, base2 int);
> - create table pktable(ptest1 inet, ptest2 inet[], primary key(base1, ptest1), foreign key(base2, ptest2) references
> - pktable(base1, ptest1)) inherits (pktable_base);
> - NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "pktable_pkey" for table "pktable"
> - ERROR: foreign key constraint "pktable_base2_fkey" cannot be implemented
> - DETAIL: Key columns "ptest2" and "ptest1" are of incompatible types: inet[] and inet.

Instead of deleting this test, change the type from inet[] to something
unrelated, like float8.

Thanks,
nm

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2011-11-17 04:31:08 Re: FlexLocks
Previous Message Robert Haas 2011-11-17 04:26:16 Re: Minor optimisation of XLogInsert()