Re: [PATCH] Support for foreign keys with arrays

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Gabriele Bartolini <gabriele(dot)bartolini(at)2ndquadrant(dot)it>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Marco Nenciarini <marco(dot)nenciarini(at)2ndquadrant(dot)it>
Subject: Re: [PATCH] Support for foreign keys with arrays
Date: 2011-11-17 04:50:27
Message-ID: CAFj8pRCEhyC_PK3jJHE5ZS8Qo06+OBpbcjAkELW0skOgewgA5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello

2011/11/17 Noah Misch <noah(at)leadboat(dot)com>:
> Hi Gabriele,
>
> On Fri, Nov 04, 2011 at 01:48:02PM +0100, Gabriele Bartolini wrote:
>> CREATE TABLE pt (
>>   id INTEGER PRIMARY KEY,
>>   ...
>> );
>>
>> CREATE TABLE ft (
>>   id SERIAL PRIMARY KEY,
>>   pids INTEGER[] REFERENCES pt,
>>   ...
>> );
>
> This seems useful.
>

will be supported situation

CREATE TABLE main(
id int[] PRIMARY KEY,
...
);

CREATE TABLE child(
main_id int[] REFERENCES main(id),

??

Regards

Pavel Stehule

> I'm assuming the SQL spec says nothing about a feature like this?
>
>> This patch is for discussion and has been built against HEAD.
>> It compiles and passes all regressions tests (including specific ones -
>> see the src/test/regress/sql/foreign_key.sql file).
>> Empty arrays, multi-dimensional arrays, duplicate elements and NULL
>> values are allowed.
>
> With this patch, RI_Initial_Check does not detect a violation in an array that
> contains at least one conforming element:
>
> BEGIN;
> CREATE TABLE parent (c int PRIMARY KEY);
> CREATE TABLE child  (c int[]);
> INSERT INTO parent VALUES (1);
> INSERT INTO child VALUES ('{3,1,2}');
> ALTER TABLE child ADD FOREIGN KEY (c) REFERENCES parent; -- should error
> INSERT INTO child VALUES ('{3,1,2}'); -- does error, as expected
> ROLLBACK;
>
> The error message DETAIL on constraint violation would benefit from
> array-FK-specific language.  Example of current message:
>
> ERROR:  insert or update on table "child" violates foreign key constraint "child_c_fkey"
> DETAIL:  Key (c)=({3,1,2}) is not present in table "parent".
>
>
> The patch is missing a change to the code that does FK=FK checks when a user
> updates the FK side:
>
> \set VERBOSITY verbose
> BEGIN;
> CREATE TABLE parent (c int PRIMARY KEY);
> CREATE TABLE child  (c int[] REFERENCES parent);
> INSERT INTO parent VALUES (1);
> INSERT INTO child VALUES ('{1,1}');
> COMMIT;
> -- ERROR:  XX000: no conversion function from integer[] to integer
> -- LOCATION:  ri_HashCompareOp, ri_triggers.c:4097
> UPDATE child SET c = '{1,1}';
> DROP TABLE parent, child;
> COMMIT;
>
> Please audit each ri_triggers.c entry point for further problems like this.
>
>> We had to enforce some limitations, due to the lack (yet) of a clear and
>> universally accepted behaviour and strategy.
>> For example, consider the ON DELETE action on the above tables: in case
>> of delete of a record in the 'pt' table, should we remove the whole row
>> or just the values from the array?
>> We hope we can start a discussion from here.
>
> Removing values from the array seems best to me.  There's no doubt about what
> ON UPDATE CASCADE should do, and having ON DELETE CASCADE excise individual
> array elements is consistent with that.  It's less clear for SET NULL, but I'd
> continue with a per-element treatment.  I'd continue to forbid SET DEFAULT.
>
> However, Jeff Davis did expect ON DELETE CASCADE to remove entire rows:
> http://archives.postgresql.org/message-id/1288119207.15279.24.camel@jdavis-ux.asterdata.local
> So, perhaps the behavior needs to be user-selectable.
>
>> Current limitations:
>>
>> * Only arrays of the same type as the primary key in the referenced
>> table are supported
>
> This is understandable for a WIP, but the final patch will need to use our
> existing, looser foreign key type match requirement.
>
>> * multi-column foreign keys are not supported (only single column)
>
> Any particular reason for this?
>
>> *** a/doc/src/sgml/ddl.sgml
>> --- b/doc/src/sgml/ddl.sgml
>> ***************
>> *** 764,769 **** CREATE TABLE order_items (
>> --- 764,796 ----
>>       the last table.
>>      </para>
>>
>> +    <para>
>> +     Another option you have with foreign keys is to use a referencing column
>> +     which is an array of elements with the same type as the referenced column
>> +     in the related table. This feature, also known as <firstterm>foreign key arrays</firstterm>,
>> +     is described in the following example:
>
> Please wrap your documentation paragraphs.
>
>> *** a/src/backend/commands/tablecmds.c
>> --- b/src/backend/commands/tablecmds.c
>> ***************
>> *** 5705,5710 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
>> --- 5705,5735 ----
>>               Oid                     ffeqop;
>>               int16           eqstrategy;
>>
>> +             /* Check if foreign key is an array of primary key types */
>> +             const bool              is_foreign_key_array = (fktype == get_array_type (pktype));
>
> We don't declare non-pointer, local variables "const".  Also, [not positive on
> this one] when an initial assignment requires a comment, declare the variable
> with no assignment and no comment.  Then, assign it later with the comment.
> This keeps the per-block declarations packed together.
>
>
> This test wrongly rejects FK types that are domains over the array type:
>
> BEGIN;
> CREATE TABLE parent (c int PRIMARY KEY);
> CREATE DOMAIN intarrdom AS int[];
> CREATE TABLE child  (c intarrdom REFERENCES parent);
> ROLLBACK;
>
>> +
>> +             /* Enforce foreign key array restrictions */
>> +             if (is_foreign_key_array)
>> +             {
>> +                     /*
>> +                      * Foreign key array must not be part of a multi-column foreign key
>> +                      */
>> +                     if (is_foreign_key_array && numpks > 1)
>> +                             ereport(ERROR,
>> +                                     (errcode(ERRCODE_INVALID_FOREIGN_KEY),
>> +                                     errmsg("foreign key arrays must not be part of a multi-column foreign key")));
>> +
>> +                     /*
>> +                      * We have to restrict foreign key array to NO ACTION and RESTRICT mode
>> +                      * until the behaviour triggered by the other actions is clearer and well defined
>> +                      */
>> +                     if ((fkconstraint->fk_upd_action != FKCONSTR_ACTION_NOACTION && fkconstraint->fk_upd_action != FKCONSTR_ACTION_RESTRICT)
>> +                             || (fkconstraint->fk_del_action != FKCONSTR_ACTION_NOACTION && fkconstraint->fk_del_action != FKCONSTR_ACTION_RESTRICT))
>
> Break these lines to keep things within 78 columns.  Audit the remainder of
> your changes for long lines, and break when in doubt.
>
>> +                             ereport(ERROR,
>> +                                     (errcode(ERRCODE_INVALID_FOREIGN_KEY),
>> +                                     errmsg("NO ACTION and RESTRICT are the only supported actions for foreign key arrays")));
>
> Error message constants can remain unbroken, though.
>
>> +             }
>> +
>>               /* We need several fields out of the pg_opclass entry */
>>               cla_ht = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclasses[i]));
>>               if (!HeapTupleIsValid(cla_ht))
>> ***************
>> *** 5766,5772 **** ATAddForeignKeyConstraint(AlteredTableInfo *tab, Relation rel,
>>                       Oid                     target_typeids[2];
>>
>>                       input_typeids[0] = pktype;
>> !                     input_typeids[1] = fktype;
>>                       target_typeids[0] = opcintype;
>>                       target_typeids[1] = opcintype;
>>                       if (can_coerce_type(2, input_typeids, target_typeids,
>> --- 5791,5801 ----
>>                       Oid                     target_typeids[2];
>>
>>                       input_typeids[0] = pktype;
>> !                     /* When is FKA we must use for FK the same type of PK */
>> !                     if (is_foreign_key_array)
>> !                             input_typeids[1] = pktype;
>> !                     else
>> !                             input_typeids[1] = fktype;
>>                       target_typeids[0] = opcintype;
>>                       target_typeids[1] = opcintype;
>>                       if (can_coerce_type(2, input_typeids, target_typeids,
>
> This is bogus; the can_coerce_type test will always pass (excluding bad cases
> of catalog inconsistency).
>
> ATAddForeignKeyConstraint should choose to make an array foreign key whenever
> the PK side is a scalar and the FK side is an array.  Then, grab the element
> type of the FK side and feed that through the operator-identification logic.
>
>> *** a/src/backend/utils/adt/ri_triggers.c
>> --- b/src/backend/utils/adt/ri_triggers.c
>> ***************
>> *** 460,465 **** RI_FKey_check(PG_FUNCTION_ARGS)
>> --- 460,466 ----
>>               char            paramname[16];
>>               const char *querysep;
>>               Oid                     queryoids[RI_MAX_NUMKEYS];
>> +             bool            is_foreign_key_array = false;
>>
>>               /* ----------
>>                * The query string built is
>> ***************
>> *** 476,493 **** RI_FKey_check(PG_FUNCTION_ARGS)
>>               {
>>                       Oid                     pk_type = RIAttType(pk_rel, riinfo.pk_attnums[i]);
>>                       Oid                     fk_type = RIAttType(fk_rel, riinfo.fk_attnums[i]);
>>
>>                       quoteOneName(attname,
>>                                                RIAttName(pk_rel, riinfo.pk_attnums[i]));
>>                       sprintf(paramname, "$%d", i + 1);
>> !                     ri_GenerateQual(&querybuf, querysep,
>> !                                                     attname, pk_type,
>> !                                                     riinfo.pf_eq_oprs[i],
>> !                                                     paramname, fk_type);
>>                       querysep = "AND";
>>                       queryoids[i] = fk_type;
>>               }
>> !             appendStringInfo(&querybuf, " FOR SHARE OF x");
>>
>>               /* Prepare and save the plan */
>>               qplan = ri_PlanCheck(querybuf.data, riinfo.nkeys, queryoids,
>> --- 477,524 ----
>>               {
>>                       Oid                     pk_type = RIAttType(pk_rel, riinfo.pk_attnums[i]);
>>                       Oid                     fk_type = RIAttType(fk_rel, riinfo.fk_attnums[i]);
>> +                     is_foreign_key_array = (fk_type == get_array_type (pk_type));
>
> Drop the extra whitespace before the function argument list.
>
>>
>>                       quoteOneName(attname,
>>                                                RIAttName(pk_rel, riinfo.pk_attnums[i]));
>>                       sprintf(paramname, "$%d", i + 1);
>> !                     /*
>> !                      * In case of an array foreign key, we check that every
>> !                      * DISTINCT NOT NULL value in the array is present in the PK table.
>> !                      * XXX: This works because the query is executed with LIMIT 1,
>
> I found this comment confusing, since the SQL syntax "LIMIT 1" is never used
> here.  I suppose you're referring to the fact that we call into SPI with
> tcount = 1?
>
>> !                      * but may not work properly with SSI (a better approach would be
>> !                      * to inspect the array and skip the check in case of empty arrays).
>
> Why might serializable transactions be specially affected?
>
>> !                      */
>> !                     if (is_foreign_key_array)
>> !                     {
>> !                             appendStringInfo(&querybuf, " %s (SELECT count(*) FROM (SELECT DISTINCT UNNEST(%s)) y WHERE y IS NOT NULL)", querysep, paramname);
>> !                             appendStringInfo(&querybuf, " = (SELECT count(*) FROM (SELECT 1 FROM ONLY %s y",  pkrelname);
>> !                             ri_GenerateQual(&querybuf, "WHERE",
>> !                                                             attname, pk_type,
>> !                                                             riinfo.pf_eq_oprs[i],
>> !                                                             paramname, fk_type);
>> !                             /*
>> !                              * We lock for share every row in the pkreltable that is
>> !                              * referenced by the array elements
>> !                              */
>> !                             appendStringInfo(&querybuf, " FOR SHARE OF y) z)");
>
> The resulting query performs an irrelevant sequential scan on the PK table:
>
> SELECT 1 FROM ONLY "public"."parent" x WHERE (SELECT count(*) FROM (SELECT DISTINCT UNNEST($1)) y WHERE y IS NOT NULL) = (SELECT count(*) FROM (SELECT 1 FROM ONLY "public"."parent" y WHERE "c" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF y) z)
>
> As you suggested with that comment above, this scan always ends after one row.
> That places a bound on the actual performance hit.  However, we still read the
> one row, which may mean loading a page for nothing.  At a minimum, simplify
> this query to:
>
> SELECT 1                               WHERE (SELECT count(*) FROM (SELECT DISTINCT UNNEST($1)) y WHERE y IS NOT NULL) = (SELECT count(*) FROM (SELECT 1 FROM ONLY "public"."parent" y WHERE "c" OPERATOR(pg_catalog.=) ANY ($1) FOR SHARE OF y) z)
>
> That also naturally handles empty arrays against empty PK tables, which
> currently fail for me even at READ COMMITTED:
>
> BEGIN;
> CREATE TABLE parent (c int PRIMARY KEY);
> CREATE TABLE child  (c int[] REFERENCES parent);
> INSERT INTO child VALUES ('{}'); -- fails wrongly
> ROLLBACK;
>
>> !                     }
>> !                     else
>> !                     {
>> !                             ri_GenerateQual(&querybuf, querysep,
>> !                                                             attname, pk_type,
>> !                                                             riinfo.pf_eq_oprs[i],
>> !                                                             paramname, fk_type);
>> !                     }
>>                       querysep = "AND";
>>                       queryoids[i] = fk_type;
>>               }
>> !             /*
>> !              * We skip locking for share in case of foreign key arrays
>> !              * as it has been done in the inner subquery
>> !              */
>> !             if (! is_foreign_key_array)
>
> Drop the whitespace after the "!".
>
>> !                     appendStringInfo(&querybuf, " FOR SHARE OF x");
>>
>>               /* Prepare and save the plan */
>>               qplan = ri_PlanCheck(querybuf.data, riinfo.nkeys, queryoids,
>
>> *** a/src/test/regress/expected/foreign_key.out
>> --- b/src/test/regress/expected/foreign_key.out
>> ***************
>> *** 968,978 **** drop table pktable;
>>   drop table pktable_base;
>>   -- 2 columns (1 table), mismatched types
>>   create table pktable_base(base1 int not null, base2 int);
>> - create table pktable(ptest1 inet, ptest2 inet[], primary key(base1, ptest1), foreign key(base2, ptest2) references
>> -                                              pktable(base1, ptest1)) inherits (pktable_base);
>> - NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "pktable_pkey" for table "pktable"
>> - ERROR:  foreign key constraint "pktable_base2_fkey" cannot be implemented
>> - DETAIL:  Key columns "ptest2" and "ptest1" are of incompatible types: inet[] and inet.
>
> Instead of deleting this test, change the type from inet[] to something
> unrelated, like float8.
>
> Thanks,
> nm
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers(at)postgresql(dot)org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2011-11-17 05:03:56 Re: ISN was: Core Extensions relocation
Previous Message Pavan Deolasee 2011-11-17 04:49:58 Re: FlexLocks