Re: GSoC 2017: Foreign Key Arrays

From: Andreas Karlsson <andreas(at)proxel(dot)se>
To: Mark Rofail <markm(dot)rofail(at)gmail(dot)com>
Cc: Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, David Steele <david(at)pgmasters(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Erik Rijkers <er(at)xs4all(dot)nl>, Robert Haas <robertmhaas(at)gmail(dot)com>, pgsql-hackers <pgsql-hackers(at)postgresql(dot)org>, pgsql-hackers-owner(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Hans-Jürgen Schönig <hs(at)cybertec(dot)at>
Subject: Re: GSoC 2017: Foreign Key Arrays
Date: 2017-10-30 01:24:06
Message-ID: 76a8d3d8-a22e-3299-7c4e-6e115dbf3762@proxel.se
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Sorry for the very late review.

I like this feature and have needed it myself in the past, and the
current syntax seems pretty good. One could argue for if the syntax
could be generalized to support other things like json and hstore, but I
do not think it would be fair to block this patch due to that.

== Limitations of the current design

1) Array element foreign keys can only be specified at the table level
(not at columns): I think this limitation is fine. Other PostgreSQL
specific features like exclusion contraints can also only be specified
at the table level.

2) Lack of support for SET NULL and SET DEFAULT: these do not seem very
useful for arrays.

3) Lack of support for specifiying multiple arrays in the foreign key:
seems like a good thing to me since it is not obvious what such a thing
even would do.

4) That you need to add a cast to the index if you have different types:
due to there not being a int4[] <@ int2[] operator you need to add an
index on (col::int4[]) to speed up deletes and updates. This one i
annoying since EXPLAIN wont give you the query plans for the foreign key
queries, but I do not think fixing this should be within the scope of
the patch and that having a smaller interger in the referring table is rare.

5) The use of count(DISTINCT) requiring types to support btree equality:
this has been discussed a lot up-thread and I think the current state is
good enough.

== Functional review

I have played around some with it and things seem to work and the test
suite passes, but I noticed a couple of strange behaviors.

1) MATCH FULL does not seem to care about NULLS in arrays. In the
example below I expected both inserts into the referring table to fail.

CREATE TABLE t (x int, y int, PRIMARY KEY (x, y));
CREATE TABLE fk (x int, ys int[], FOREIGN KEY (x, EACH ELEMENT OF ys)
REFERENCES t MATCH FULL);
INSERT INTO t VALUES (10, 1);
INSERT INTO fk VALUES (10, '{1,NULL}');
INSERT INTO fk VALUES (NULL, '{1}');

CREATE TABLE
CREATE TABLE
INSERT 0 1
INSERT 0 1
ERROR: insert or update on table "fk" violates foreign key constraint
"fk_x_fkey"
DETAIL: MATCH FULL does not allow mixing of null and nonnull key values.

2) To me it was not obvious that ON DELETE CASCADE would delete the
whole rows rather than delete the members from the array, and this kind
of misunderstanding can lead to pretty bad surprises in production. I am
leaning towards not supporting CASCADE.

== The @>> operator

A previous version of your patch added the "anyelement <<@ anyarray"
operator to avoid having to build arrays, but that part was reverted due
to a bug.

I am not expert on the gin code, but as far as I can tell it would be
relatively simple to fix that bug. Just allocate an array of Datums of
length one where you put the element you are searching for (or maybe a
copy of it).

Potential issues with adding the operators:

1) Do we really want to add an operator just for array element foreign
keys? I think this is not an issue since it seems like it should be
useful in general. I know I have wanted it myself.

2) I am not sure, but the committers might prefer if adding the
operators is done in a separate patch.

3) Bikeshedding about operator names. I personally think @>> is clear
enough and as far as I know it is not used for anything else.

== Code review

The patch no longer applies to HEAD, but the conflicts are small.

I think we should be more consistent in the naming, both in code and in
the documentation. Right now we have "array foreign keys", "element
foreign keys", "ELEMENT foreign keys", etc.

+ /*
+ * If this is an array foreign key, we must look up the
operators for
+ * the array element type, not the array type itself.
+ */
+ if (fkreftypes[i] != FKCONSTR_REF_PLAIN)

+ if (fkreftypes[i] != FKCONSTR_REF_PLAIN)
+ {
+ old_fktype = get_base_element_type(old_fktype);
+ /* this shouldn't happen ... */
+ if (!OidIsValid(old_fktype))
+ elog(ERROR, "old foreign key column is not an array");
+ }

+ if (riinfo->fk_reftypes[i] != FKCONSTR_REF_PLAIN)
+ {
+ riinfo->has_array = true;
+ riinfo->ff_eq_oprs[i] = ARRAY_EQ_OP;
+ }

In the three diffs above it would be much cleaner to check for "==
FKCONSTR_REF_EACH_ELEMENT" since that better conveys the intent and is
safer for adding new types in the future.

+ /* We look through any domain here */
+ fktype = get_base_element_type(fktype);

What does the comment above mean?

if (!(OidIsValid(pfeqop) && OidIsValid(ffeqop)))
ereport(ERROR,
(errcode(ERRCODE_DATATYPE_MISMATCH),
- errmsg("foreign key constraint \"%s\" "
- "cannot be implemented",
- fkconstraint->conname),
- errdetail("Key columns \"%s\" and \"%s\" "
- "are of incompatible types: %s and %s.",
+ errmsg("foreign key constraint \"%s\" cannot be
implemented",
+ fkconstraint->conname),
+ errdetail("Key columns \"%s\" and \"%s\" are of
incompatible types: %s and %s.",
strVal(list_nth(fkconstraint->fk_attrs, i)),
strVal(list_nth(fkconstraint->pk_attrs, i)),
- format_type_be(fktype),
+ format_type_be(fktypoid[i]),
format_type_be(pktype))));

The above diff looks like pointless code churn which possibly introduces
a bug in how it changed fktype to fktypoid[i].

I think the code in RI_Initial_Check() would be cleaner if you used
"CROSS JOIN LATERAL unnest(col)" rather than having unnest() in the
target list. This way you would not need to rename all columns and the
code paths for the array case could look more like the code path for the
normal case.

== Minor things in the code

+ {
+ pk_attrs = lappend(pk_attrs, arg);
+ fk_reftypes = lappend_int(fk_reftypes, FKCONSTR_REF_PLAIN);
+ }

This is incorrectly indented.

I suggest that you should only allocate countbuf in RI_FKey_check() if
has_array is set.

I think the code would be more readable if both branches of
ri_GenerateQual() used the same pattern for whitespace so the
differences are easier to spot.

You can use DROP TABLE IF EXISTS to silence the "ERROR: table
"fktableforarray" does not exist" errors.

I am not sure that you need to test all combinations of ON UPDATE and ON
DELETE. I think it should be enough to test one of each ON UPDATE and
one of each ON DELETE should be enough.

+-- Create the foreign table

It is probably a bad idea to call the referencing table the foreign table.

+-- Repeat a similar test using INT4 keys coerced from INT2

This comment is repeated twice in the test suite.

== Documentation

Remove the ELEMENT REFERENCES form from
doc/src/sgml/ref/create_table.sgml since we no longer support it.

- FOREIGN KEY ( <replaceable
class="parameter">column_name</replaceable> [, ... ] ) REFERENCES
<replaceable class="parameter">reftable</replaceable> [ ( <replaceable
class="parameter">refcolumn</replaceable> [, ... ] ) ]
+ FOREIGN KEY ( [ELEMENT] <replaceable
class="parameter">column_name</replaceable> [, ... ] ) REFERENCES
<replaceable class="parameter">reftable</replaceable> [ ( <replaceable
class="parameter">refcolumn</replaceable> [, ... ] ) ]

Change this to "FOREIGN KEY ( [EACH ELEMENT OF] ...".

- <term><literal>FOREIGN KEY ( <replaceable
class="parameter">column_name</replaceable> [, ... ] )
+ <term><literal>FOREIGN KEY ( [ELEMENT] <replaceable
class="parameter">column_name</replaceable> [, ... ] )

Change this to "... FOREIGN KEY ( [EACH ELEMENT OF] ...".

+ <literal>ELEMENT</literal> keyword and <replaceable

Change this to "... <literal>EACH ELEMENT OF</literal> ...". Maybe you
should also fix other instances of ELEMENT in the same paragraph but
these are less obvious.

You should remove the "ELEMENT REFERENCES" section of
doc/src/sgml/ref/create_table.sgml, and move any still relevant bits
elsewhere since we do not support this syntax.

The sql-createtable-foreign-key-arrays section need to be updated to
remove references to "ELEMENT REFERENCES".

Your indentation in doc/src/sgml/catalogs.sgml is two spaces but the
rest of the file looks like it uses 1 space indentation. Additionally
you seem to have accidentally reindented something which was correctly
indented.

Same with the idnentation in doc/src/sgml/ddl.sgml and
doc/src/sgml/ref/create_table.sgml.

- <varlistentry>
+ <varlistentry id="sql-createtable-foreign-key" xreflabel="FOREIGN KEY">

You have accidentally reindented this in doc/src/sgml/ref/create_table.sgml.

The paragraph in doc/src/sgml/ref/create_table.sgml which starts with
"In case the column name" seems to actually be multiple paragraphs. Is
that intentional or a mistake?

The documentation in doc/src/sgml/ddl.sgml mentions that "it must be
written in table constraint form" for when you have multiple columns,
but I feel this is just redundant and confusing since this is always
true, both for array foreign keys and for when you have multiple columns.

The documentation in doc/src/sgml/ddl.sgml should mention that we only
support one array reference per foreign key.

Maybe the documentation in doc/src/sgml/ddl.sgml should mention that we
only support the table constraint form.

Andreas

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andreas Karlsson 2017-10-30 01:35:37 Re: git down
Previous Message Thomas Munro 2017-10-30 00:49:41 Re: Parallel Hash take II