Re: SQL:2011 application time

From: Paul Jungwirth <pj(at)illuminatedcomputing(dot)com>
To: jian he <jian(dot)universality(at)gmail(dot)com>, Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: SQL:2011 application time
Date: 2024-01-18 03:59:06
Message-ID: 0af6a323-f026-4cd9-adf1-941038cd00de@illuminatedcomputing.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello,

Here are new patches consolidating feedback from several emails.
I haven't addressed everything but I think I'm overdue for a reply:

On 1/4/24 21:06, jian he wrote:
>
> I am confused.
> say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)"
> the following code will only run PartA, never run PartB?
>
> `
> else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
> PartA
> else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
> PartB
> `
>
> minimum example:
> #include<stdio.h>
> #include<string.h>
> #include<stdlib.h>
> #include<assert.h>
> int
> main(void)
> {
> int cmp_l1l2;
> int cmp_u1u2;
> int cmp_u1l2;
> int cmp_l1u2;
> cmp_l1u2 = -1;
> cmp_l1l2 = 0;
> cmp_u1u2 = 0;
> cmp_u1l2 = 0;
> assert(cmp_u1l2 == 0);
> if (cmp_l1u2 > 0 || cmp_u1l2 < 0)
> printf("calling partA\n");
> else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
> printf("calling partB\n");
> else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
> printf("calling partC\n");
> }

All of the branches are used. I've attached a `without_portion.c` minimal example showing different
cases. For ranges it helps to go through the Allen relationships
(https://en.wikipedia.org/wiki/Allen%27s_interval_algebra) to make a comprehensive check. (But note
that our operators don't exactly match that terminology, and it's important to consider
closed-vs-open and unbounded cases.)

> I am confused with the name "range_without_portion", I think
> "range_not_overlap" would be better.

I think I covered this in my other reply and we are now in agreement, but if that's mistaken let
know me.

> select numrange(1.1, 2.2) @- numrange(2.0, 3.0);
> the result is not the same as
> select numrange(2.0, 3.0) @- numrange(1.1, 2.2);

Correct, @- is not commutative.

> So your categorize oprkind as 'b' for operator "@-" is wrong?
> select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode
> from pg_operator
> where oprname = '@-';

'b' is the correct oprkind. It is a binary (infix) operator.

> aslo
> select count(*), oprkind from pg_operator group by oprkind;
> there are only 5% are prefix operators.
> maybe we should design it as:
> 1. if both inputs are empty range, the result array is empty.
> 2. if both inputs are non-empty and never overlaps, put both of them
> to the result array.
> 3. if one input is empty another one is not, then put the non-empty
> one into the result array.

Also covered before, but if any of this still applies please let me know.

> after applying the patch: now the catalog data seems not correct to me.
> SELECT a1.amopfamily
> ,a1.amoplefttype::regtype
> ,a1.amoprighttype
> ,a1.amopstrategy
> ,amoppurpose
> ,amopsortfamily
> ,amopopr
> ,op.oprname
> ,am.amname
> FROM pg_amop as a1 join pg_operator op on op.oid = a1.amopopr
> join pg_am am on am.oid = a1.amopmethod
> where amoppurpose = 'p';
> output:
> amopfamily | amoplefttype | amoprighttype | amopstrategy |
> amoppurpose | amopsortfamily | amopopr | oprname | amname
>
------------+---------------+---------------+--------------+-------------+----------------+---------+---------+--------
> 2593 | box | 603 | 31 | p
> | 0 | 803 | # | gist
> 3919 | anyrange | 3831 | 31 | p
> | 0 | 3900 | * | gist
> 6158 | anymultirange | 4537 | 31 | p
> | 0 | 4394 | * | gist
> 3919 | anyrange | 3831 | 32 | p
> | 0 | 8747 | @- | gist
> 6158 | anymultirange | 4537 | 32 | p
> | 0 | 8407 | @- | gist
> (5 rows)
>
> select oprcode, oprname, oprleft::regtype
> from pg_operator opr
> where opr.oprname in ('#','*','@-')
> and oprleft = oprright
> and oprleft in (603,3831,4537);
> output:
>
> oprcode | oprname | oprleft
> ----------------------------+---------+---------------
> box_intersect | # | box
> range_intersect | * | anyrange
> multirange_intersect | * | anymultirange
> range_without_portion | @- | anyrange
> multirange_without_portion | @- | anymultirange
> (5 rows)

This seems correct. '#' is the name of the box overlaps operator. Probably I should add a box @-
operator too. But see below. . . .

> should amoppurpose = 'p' is true apply to ' @-' operator?

Yes.

> catalog-pg-amop.html:
> `
> amopsortfamily oid (references pg_opfamily.oid):
> The B-tree operator family this entry sorts according to, if an
> ordering operator; zero if a search operator
> `
> you should also update the above entry, the amopsortfamily is also
> zero for "portion operator" for the newly implemented "portion
> operator".

Okay, done.

> v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
> create mode 100644 src/backend/utils/adt/period.c
> create mode 100644 src/include/utils/period.h
> you should put these two files to v21-0008-Add-PERIODs.patch.
> it's not related to that patch, it also makes people easy to review.

You're right, sorry!

On 1/8/24 16:00, jian he wrote:
>
> +/*
> + * ForPortionOfClause
> + * representation of FOR PORTION OF <period-name> FROM <ts> TO <te>
> + * or FOR PORTION OF <period-name> (<target>)
> + */
> +typedef struct ForPortionOfClause
> +{
> + NodeTag type;
> + char *range_name;
> + int range_name_location;
> + Node *target;
> + Node *target_start;
> + Node *target_end;
> +} ForPortionOfClause;
>
> "range_name_location" can be just "location"?
> generally most of the struct put the "location" to the last field in the struct.
> (that's the pattern I found all over other code)

Agreed, done.

> + if (isUpdate)
> + {
> + /*
> + * Now make sure we update the start/end time of the record.
> + * For a range col (r) this is `r = r * targetRange`.
> + */
> + Expr *rangeSetExpr;
> + TargetEntry *tle;
> +
> + strat = RTIntersectStrategyNumber;
> + GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects",
> "FOR PORTION OF", &opid, &strat);
> + rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid),
> + (Node *) copyObject(rangeVar), targetExpr,
> + forPortionOf->range_name_location);
> + rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr,
> EXPR_KIND_UPDATE_PORTION);
> +
> + /* Make a TLE to set the range column */
> + result->rangeSet = NIL;
> + tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false);
> + result->rangeSet = lappend(result->rangeSet, tle);
> +
> + /* Mark the range column as requiring update permissions */
> + target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols,
> + range_attno - FirstLowInvalidHeapAttributeNumber);
> + }
> + else
> + result->rangeSet = NIL;
> I think the name "rangeSet" is misleading, since "set" is generally
> related to a set of records.
> but here it's more about the "range intersect".

Okay, I can see that. I used "rangeSet" because we add it to the SET clause of the UPDATE command.
Here I've changed it to rangeTargetList. I think this matches other code and better indicates what
it holds. Any objections?

In the PERIOD patch we will need two TLEs here (that's why it's a List): one for the start column
and one for the end column.

> in ExecDelete
> we have following code pattern:
> ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart);
> if (processReturning && resultRelInfo->ri_projectReturning)
> {
> ....
> if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
> SnapshotAny, slot))
> elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
> }
> }
>
> but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue.
> meaning even without ExecForPortionOfLeftovers, we can still call
> table_tuple_fetch_row_version
> also if it was *not* concurrently updated, then our current process
> holds the lock until the ending of the transaction, i think.
> So the following TODO is unnecessary?
>
> + /*
> + * Get the range of the old pre-UPDATE/DELETE tuple,
> + * so we can intersect it with the FOR PORTION OF target
> + * and see if there are any "leftovers" to insert.
> + *
> + * We have already locked the tuple in ExecUpdate/ExecDelete
> + * (TODO: if it was *not* concurrently updated, does
> table_tuple_update lock the tuple itself?
> + * I don't found the code for that yet, and maybe it depends on the AM?)
> + * and it has passed EvalPlanQual.
> + * Make sure we're looking at the most recent version.
> + * Otherwise concurrent updates of the same tuple in READ COMMITTED
> + * could insert conflicting "leftovers".
> + */
> + if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
> tupleid, SnapshotAny, oldtupleSlot))
> + elog(ERROR, "failed to fetch tuple for FOR PORTION OF");

I think you're right. According to the comments on TM_Result (returned by table_tuple_update), a
TM_Ok indicates that the lock was acquired.

> +/* ----------------------------------------------------------------
> + * ExecForPortionOfLeftovers
> + *
> + * Insert tuples for the untouched timestamp of a row in a FOR
> + * PORTION OF UPDATE/DELETE
> + * ----------------------------------------------------------------
> + */
> +static void
> +ExecForPortionOfLeftovers(ModifyTableContext *context,
> + EState *estate,
> + ResultRelInfo *resultRelInfo,
> + ItemPointer tupleid)
>
> maybe change the comment to
> "Insert tuples for the not intersection of a row in a FOR PORTION OF
> UPDATE/DELETE."

Changed to "untouched portion".

> + deconstruct_array(DatumGetArrayTypeP(allLeftovers),
> typcache->type_id, typcache->typlen,
> + typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers);
> +
> + if (nleftovers > 0)
> + {
> I think add something like assert nleftovers >=0 && nleftovers <= 2
> (assume only range not multirange) would improve readability.

I added the first assert. The second is not true for non-range types.

> + <para>
> + If the table has a range column or
> + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
> + you may supply a <literal>FOR PORTION OF</literal> clause, and
> your delete will
> + only affect rows that overlap the given interval. Furthermore, if
> a row's span
> + extends outside the <literal>FOR PORTION OF</literal> bounds, then
> your delete
> + will only change the span within those bounds. In effect you are
> deleting any
> + moment targeted by <literal>FOR PORTION OF</literal> and no moments outside.
> + </para>
> +
> + <para>
> + Specifically, after <productname>PostgreSQL</productname> deletes
> the existing row,
> + it will <literal>INSERT</literal>
> + new rows whose range or start/end column(s) receive the remaining
> span outside
> + the targeted bounds, containing the original values in other columns.
> + There will be zero to two inserted records,
> + depending on whether the original span extended before the targeted
> + <literal>FROM</literal>, after the targeted <literal>TO</literal>,
> both, or neither.
> + </para>
> +
> + <para>
> + These secondary inserts fire <literal>INSERT</literal> triggers. First
> + <literal>BEFORE DELETE</literal> triggers first, then
> + <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>,
> + then <literal>AFTER DELETE</literal>.
> + </para>
> +
> + <para>
> + These secondary inserts do not require <literal>INSERT</literal>
> privilege on the table.
> + This is because conceptually no new information has been added.
> The inserted rows only preserve
> + existing data about the untargeted time period. Note this may
> result in users firing <literal>INSERT</literal>
> + triggers who don't have insert privileges, so be careful about
> <literal>SECURITY DEFINER</literal> trigger functions!
> + </para>
>
> I think you need to wrap them into a big paragraph, otherwise they
> lose the context?
> please see the attached build sql-update.html.

Still TODO.

> also I think
> + <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
> should shove into Add-PERIODs.patch.
>
> otherwise you cannot build Add-UPDATE-DELETE-FOR-PORTION-OF.patch
> without all the patches.

Fixed.

> I think the "FOR-PORTION-OF" feature is kind of independ?
> Because, IMHO, "for portion" is a range datum interacting with another
> single range datum, but the primary key with "WITHOUT OVERLAPS", is
> range datum interacting with a set of range datums.
> now I cannot just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch.
> That maybe would make it more difficult to get commited?

Still TODO.

On 1/8/24 21:33, jian he wrote:
>
> src5=# select range_without_portion(numrange(1.0,3.0,'[]'),
> numrange(1.5,2.0,'(]'));
> range_without_portion
> ---------------------------
> {"[1.0,1.5]","(2.0,3.0]"}
> (1 row)
>
> src5=# \gdesc
> Column | Type
> -----------------------+-----------
> range_without_portion | numeric[]
> (1 row)
>
> src5=# \df range_without_portion
> List of functions
> Schema | Name | Result data type | Argument data
> types | Type
> ------------+-----------------------+------------------+---------------------+------
> pg_catalog | range_without_portion | anyarray | anyrange,
> anyrange | func
> (1 row)
>
> so apparently, you cannot from (anyrange, anyrange) get anyarray the
> element type is anyrange.
> I cannot find the documented explanation in
> https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
>
> anyrange is POLYMORPHIC, anyarray is POLYMORPHIC,
> but I suppose, getting an anyarray the element type is anyrange would be hard.

You're right, that is a problem.

I think the right approach is to make intersect and without_portion just be support functions, not
operators. Then I don't need to introduce the new 'p' amop strategy at all, which seemed like a
dubious idea anyway. Then the without_portion function can return a SETOF instead of an array.

Another idea is to add more polymorphic types, anyrangearray and anymultirangearray, but maybe that
is too big a thing. OTOH I have wanted those same types before. I will take a stab at it.

On 1/11/24 06:44, Peter Eisentraut wrote:
> Here is some more detailed review of the first two patches. (I reviewed v20; I see you have also
> posted v21, but they don't appear very different for this purpose.)
>
> v20-0001-Add-stratnum-GiST-support-function.patch
>
> * contrib/btree_gist/Makefile
>
> Needs corresponding meson.build updates.

Fixed.

> * contrib/btree_gist/btree_gist--1.7--1.8.sql
>
> Should gist_stratnum_btree() live in contrib/btree_gist/ or in core?
> Are there other extensions that use the btree strategy numbers for
> gist?

Moved. None of our other contrib extensions use it. I thought it would be friendly to offer it to
outside extensions, but maybe that is too speculative.

> +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
> + FUNCTION 12 (varbit, varbit) gist_stratnum_btree (int2) ;
>
> Is there a reason for the extra space after FUNCTION here (repeated
> throughout the file)?

Fixed.

> +-- added in 1.4:
>
> What is the purpose of these "added in" comments?

I added those to help me make sure I was including every type in the extension, but I've taken them
out here.

> v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
>
> * contrib/btree_gist/Makefile
>
> Also update meson.build.

Done.

> * contrib/btree_gist/sql/without_overlaps.sql
>
> Maybe also insert a few values, to verify that the constraint actually
> does something?

Done.

> * doc/src/sgml/ref/create_table.sgml
>
> Is "must have a range type" still true? With the changes to the
> strategy number mapping, any type with a supported operator class
> should work?

Updated. Probably more docs to come; I want to go through them all now that we support more types.

> * src/backend/utils/adt/ruleutils.c
>
> Is it actually useful to add an argument to
> decompile_column_index_array()? Wouldn't it be easier to just print
> the " WITHOUT OVERLAPS" in the caller after returning from it?

Okay, done.

> * src/include/access/gist_private.h
>
> The added function gistTranslateStratnum() isn't really "private" to
> gist. So access/gist.h would be a better place for it.

Moved.

> Also, most other functions there appear to be named "GistSomething",
> so a more consistent name might be GistTranslateStratnum.
>
> * src/include/access/stratnum.h

Changed.

> The added StrategyIsValid() doesn't seem that useful? Plenty of
> existing code just compares against InvalidStrategy, and there is only
> one caller for the new function. I suggest to do without it.
>
> * src/include/commands/defrem.h

Okay, removed.

> We are using two terms here, well-known strategy number and canonical
> strategy number, to mean the same thing (I think?). Let's try to
> stick with one. Or explain the relationship?

True. Changed everything to "well-known" which seems like a better match for what's going on.

I haven't gone through jian he's Jan 13 patch yet, but since he was also implementing Peter's
requests I thought I should share what I have. I did this work a while ago, but I was hoping to
finish the TODOs above first, and then we got hit with a winter storm that knocked out power. Sorry
to cause duplicate work!

Rebased to 2f35c14cfb.

Yours,

--
Paul ~{:-)
pj(at)illuminatedcomputing(dot)com

Attachment Content-Type Size
v23-0001-Add-stratnum-GiST-support-function.patch text/x-patch 20.8 KB
v23-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch text/x-patch 76.2 KB
v23-0003-Add-GiST-referencedagg-support-func.patch text/x-patch 9.9 KB
v23-0004-Add-temporal-FOREIGN-KEYs.patch text/x-patch 148.9 KB
v23-0005-Add-multi-range_without_portion-proc-operator.patch text/x-patch 17.6 KB
v23-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch text/x-patch 152.1 KB
v23-0007-Add-CASCADE-SET-NULL-SET-DEFAULT-for-temporal-fo.patch text/x-patch 112.0 KB
v23-0008-Add-PERIODs.patch text/x-patch 133.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David Rowley 2024-01-18 04:08:33 Re: Strange Bitmapset manipulation in DiscreteKnapsack()
Previous Message Shubham Khanna 2024-01-18 03:54:36 Re: Fix search_path for all maintenance commands