Re: [HACKERS] MERGE SQL Statement for PG11

From: Simon Riggs <simon(at)2ndquadrant(dot)com>
To: Peter Geoghegan <pg(at)bowt(dot)ie>
Cc: Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Date: 2018-01-29 21:34:45
Message-ID: CANP8+j+UWRS-rm0QqFyFZmK3Xd2=EsXsXYO6BgAgxNZry4Hc6w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 29 January 2018 at 20:41, Peter Geoghegan <pg(at)bowt(dot)ie> wrote:
> On Mon, Jan 29, 2018 at 6:11 AM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> New patch attached that correctly handles all known concurrency cases,
>> with expected test output.
>
> This revision, v13, seems much improved in this area.
>
>> This means MERGE will work just fine for "normal" UPDATEs, but it will
>> often fail (deterministically) in concurrent tests with mixed
>> insert/deletes or UPDATEs that touch the PK, as requested.
>
> * While looking at how you're handling concurrency/EPQ now, I noticed this code:
>
>> + /*
>> + * Test condition, if any
>> + *
>> + * In the absence of a condition we perform the action
>> + * unconditionally (no need to check separately since
>> + * ExecQual() will return true if there are no
>> + * conditions to evaluate).
>> + */
>> + if (!ExecQual(action->whenqual, econtext))
>> + {
>> + if (BufferIsValid(buffer))
>> + ReleaseBuffer(buffer);
>> + continue;
>> + }
>
> (As well as its interactions with ExecUpdate() + ExecDelete(), which
> are significant.)
>
> The way that routines like ExecUpdate() interact with MERGE for WHEN
> qual + EPQ handling seems kind of convoluted. I hope for something
> cleaner in the next revision.

Cleaner?

> * This also stood out (not sure if you were referring/alluding to this
> in the quoted text):
>
>> + /*
>> + * If EvalPlanQual did not return a tuple, it means we
>> + * have seen a concurrent delete, or a concurrent update
>> + * where the row has moved to another partition.
>> + *
>> + * UPDATE ignores this case and continues.
>> + *
>> + * If MERGE has a WHEN NOT MATCHED clause we know that the
>> + * user would like to INSERT something in this case, yet
>> + * we can't see the delete with our snapshot, so take the
>> + * safe choice and throw an ERROR. If the user didn't care
>> + * about WHEN NOT MATCHED INSERT then neither do we.
>> + *
>> + * XXX We might consider setting matched = false and loop
>> + * back to lmerge though we'd need to do something like
>> + * EvalPlanQual, but not quite.
>> + */
>> + else if (epqstate->epqresult == EPQ_TUPLE_IS_NULL &&
>> + node->mt_merge_subcommands & ACL_INSERT)
>> + {
>> + /*
>> + * We need to throw a retryable ERROR because of the
>> + * concurrent update which we can't handle.
>> + */
>> + ereport(ERROR,
>> + (errcode(ERRCODE_T_R_SERIALIZATION_FAILURE),
>> + errmsg("could not serialize access due to concurrent update")));
>> + }
>
> I don't think that ERRCODE_T_R_SERIALIZATION_FAILURE is ever okay in
> READ COMMITTED mode.

We use that code already in Hot Standby in READ COMMITTED mode.

What code should it be? It needs to be a retryable errcode.

> Anyway, I wonder why we shouldn't just go ahead
> an do the WHEN NOT MATCHED INSERT on the basis of information which is
> not visible to our snapshot. We choose to not UPDATE/DELETE on the
> basis of information from the future already, within EPQ. My first
> feeling is that this is a distinction without a difference, and you
> should actually go and INSERT at this point, though I reserve the
> right to change my mind about that.
>
> Yeah, EPQ semantics are icky, but would actually inserting here really
> be any worse than what we do?

I argued that was possible and desirable, but you argued it was not
and got everybody else to agree with you. I'm surprised to see you
change your mind on that.

At your request I have specifically written the logic to avoid that.
I'm happy with that, for now, since what we have is correct, even if
we can do better later.

We can have fun with looping, live locks and weird errors another day.
SQL Standard says this area is implementation defined.

> Other things I noticed (not related to concurrency) following a fairly
> quick pass:
>
> * Basic tab completion support would be nice.

OK, but as most other patches do, that can be done later.

> * The SQL standard explicitly imposes this limitation:
>
> postgres=# explain merge into cities a using cities b on a.city =
> b.city when matched and (select 1=1) then update set country =
> b.country;
> ERROR: 0A000: cannot use subquery in WHEN AND condition
> LINE 1: ...sing cities b on a.city = b.city when matched and (select 1=...
> ^
> LOCATION: transformSubLink, parse_expr.c:1865
>
> Why do you, though? Do we really need it? I'm just curious about your
> thoughts on it. To be clear, I'm not asserting that you're wrong.

Which limitation? Not allowing sub-selects? They are not supported, as
the message says.

> * ISTM that this patch should have inserted/updated/deleted rows, in
> roughly the same style as ON CONFLICT's EXPLAIN ANALYZE output. I
> mentioned this already, and you seemed unclear on what I meant.
> Hopefully my remarks here are clearer.

Yes, thanks. Can do.

> * Subselect handling is buggy:
>
> postgres=# merge into cities a using cities b on a.city = b.city when
> matched and a.city = 'Straffan' then update set country = (select
> 'Some country');
> ERROR: XX000: unrecognized node type: 114
> LOCATION: ExecInitExprRec, execExpr.c:2114

Not buggy, subselects are not supported in WHEN AND clauses because
they are not part of the planned query, nor can they be if we want to
handle the WHEN clause logic per spec.

> * Why no CTE support? SQL Server has this.

The SQL Standard doesn't require CTEs or RETURNING syntax, but they
could in time be supported.

> * The INSERT ... SELECT syntax doesn't work:
>
> postgres=# merge into array_test a using (select '{1,2,3}'::int[] aaa)
> b on a.aaa = b.aaa when matched then update set aaa = default when not
> matched then insert (aaa) select '{1,2,3}';
> ERROR: 42601: syntax error at or near "select"
> LINE 1: ... aaa = default when not matched then insert (aaa) select '{1...
> ^
> LOCATION: scanner_yyerror, scan.l:1092
>
> But docs imply otherwise -- "source_query -- A query (SELECT statement
> or VALUES statement) that supplies the rows to be merged into the
> target_table_name". Either the docs are wrong, or the code is wrong.
> Hopefully you can just fix the code.

Neither. The docs show that is unsupported. The source query is the
USING phrase, there is no INSERT SELECT.

> * Rules are not going to be supported, on the grounds that the
> behavior is unclear, which I suppose is fine. But what about
> ruleutils.c support?
>
> That seems like entirely another matter to me. What about EXPLAIN,
> etc? Deparse support seems to be considered generic infrastructure,
> that doesn't necessarily have much to do with the user-visible rules
> feature.

The MERGE query is a normal query, so that should all just work.

EXPLAIN is specifically regression tested.

> * This restriction seems arbitrary and unjustified:
>
> postgres=# merge into testoids a using (select 1 "key", 'foo' "data")
> b on a.key = b.key when matched and a.oid = 5 then update set data =
> b.data when not matched then insert (key, data) values (1, 'foo');
> ERROR: 42P10: system column "oid" reference in WHEN AND condition is invalid
> LINE 1: ...'foo' "data") b on a.key = b.key when matched and a.oid = 5 ...
> ^
> LOCATION: scanRTEForColumn, parse_relation.c:738

I followed the comments of how we handle CHECK constraints.

It's hard to think of a real world example that would use that; your
example seems strange. Why would you want to use Oids in the WHEN AND
clause? In the ON or DML clauses, sure, but not there.

This is a non-standard feature, so we can decide whether to support
that or not. Allowing them is easy, just not very meaningful. Not sure
if it has consequences.

> * Wholerow vars are broken:
>
> postgres=# merge into testoids a using (select 1 "key", 'foo' "data")
> b on a.key = b.key when matched then update set data = b.*::text when
> not matched then insert (key, data) values (1, 'foo');
> ERROR: XX000: variable not found in subplan target lists
> LOCATION: fix_join_expr_mutator, setrefs.c:2351
>
> That's all I have for now.

Good catch; that one is a valid error. I hadn't tried to either
support them or block them.

Maybe its in the SQL Standard, not sure. Support for whole row vars
probably isn't a priority though.

Thanks for your comments.

--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Rick Otten 2018-01-29 21:35:53 Re: dsa_allocate() faliure
Previous Message David Steele 2018-01-29 21:29:08 Re: PATCH: Configurable file mode mask