Quick Links

Re: [HACKERS] MERGE SQL Statement for PG11

From:	Peter Geoghegan <pg(at)bowt(dot)ie>
To:	Simon Riggs <simon(at)2ndquadrant(dot)com>
Cc:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Pavan Deolasee <pavan(dot)deolasee(at)2ndquadrant(dot)com>, Robert Haas <robertmhaas(at)gmail(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject:	Re: [HACKERS] MERGE SQL Statement for PG11
Date:	2018-01-29 22:31:21
Message-ID:	CAH2-WzkAjSN1H-ym-sSDh+6EJWmEhyHdDStzXDB+Fxt1hcKEgg@mail.gmail.com
Views:	Raw Message \| Whole Thread \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Mon, Jan 29, 2018 at 1:34 PM, Simon Riggs <simon(at)2ndquadrant(dot)com> wrote:
>> The way that routines like ExecUpdate() interact with MERGE for WHEN
>> qual + EPQ handling seems kind of convoluted. I hope for something
>> cleaner in the next revision.
>
> Cleaner?

Yeah, cleaner. The fact that when quals kind of participate in EPQ
evaluation without that being in execMain.c seems like it could be a
lot cleaner. I don't have more specifics than that right now.

>> I don't think that ERRCODE_T_R_SERIALIZATION_FAILURE is ever okay in
>> READ COMMITTED mode.
>
> We use that code already in Hot Standby in READ COMMITTED mode.
>
> What code should it be? It needs to be a retryable errcode.

I don't think that there should be any error, so I can't say.

>> Anyway, I wonder why we shouldn't just go ahead
>> an do the WHEN NOT MATCHED INSERT on the basis of information which is
>> not visible to our snapshot. We choose to not UPDATE/DELETE on the
>> basis of information from the future already, within EPQ. My first
>> feeling is that this is a distinction without a difference, and you
>> should actually go and INSERT at this point, though I reserve the
>> right to change my mind about that.
>>
>> Yeah, EPQ semantics are icky, but would actually inserting here really
>> be any worse than what we do?
>
> I argued that was possible and desirable, but you argued it was not
> and got everybody else to agree with you. I'm surprised to see you
> change your mind on that.

You're mistaken. Nothing I've said here is inconsistent with my
previous remarks on how we deal with concurrency.

> At your request I have specifically written the logic to avoid that.
> I'm happy with that, for now, since what we have is correct, even if
> we can do better later.
>
> We can have fun with looping, live locks and weird errors another day.
> SQL Standard says this area is implementation defined.

Who said anything about a loop? Where is the loop here?

I will rephrase what I said, to make it top-down rather than
bottom-up, which may make my intent clearer:

According to your documentation, "MERGE provides a single SQL
statement that can conditionally INSERT, UPDATE or DELETE rows, a task
that would otherwise require multiple procedural language statements".
But you're introducing a behavior/error that would not occur in
equivalent procedural client code consisting of an UPDATE followed by
a (conditionally executed) INSERT statement when run in READ COMMITTED
mode. You actually get exactly the concurrency issue that you cite as
unacceptable in justifying your serialization error with such
procedural code (when the UPDATE didn't affect any rows, only
following EPQ walking the UPDATE chain from the snapshot-visible
tuple, making the client code decide to do an INSERT on the basis of
information "from the future").

I think that it isn't this patch's job to make READ COMMITTED mode any
safer than it is in that existing scenario. A scenario that doesn't
involve ON CONFLICT at all.

>> * The SQL standard explicitly imposes this limitation:
>>
>> postgres=# explain merge into cities a using cities b on a.city =
>> b.city when matched and (select 1=1) then update set country =
>> b.country;
>> ERROR: 0A000: cannot use subquery in WHEN AND condition
>> LINE 1: ...sing cities b on a.city = b.city when matched and (select 1=...
>> ^
>> LOCATION: transformSubLink, parse_expr.c:1865
>>
>> Why do you, though? Do we really need it? I'm just curious about your
>> thoughts on it. To be clear, I'm not asserting that you're wrong.
>
> Which limitation? Not allowing sub-selects? They are not supported, as
> the message says.

I'm simply asking you to explain why you think that it would be
problematic or even impossible to support it. The question is asked
without any agenda. I'm verifying my own understanding, as much as
anything else. I've acknowledged that the standard has something to
say on this that supports your position, which has real weight.

>> * Subselect handling is buggy:
>>
>> postgres=# merge into cities a using cities b on a.city = b.city when
>> matched and a.city = 'Straffan' then update set country = (select
>> 'Some country');
>> ERROR: XX000: unrecognized node type: 114
>> LOCATION: ExecInitExprRec, execExpr.c:2114
>
> Not buggy, subselects are not supported in WHEN AND clauses because
> they are not part of the planned query, nor can they be if we want to
> handle the WHEN clause logic per spec.

I'm not asking about WHEN AND here (that was my last question). I'm
asking about a subselect that appears in the targetlist.

(In any case, "unrecognized node type: 114" seems buggy to me in any context.)

>> * Why no CTE support? SQL Server has this.
>
> The SQL Standard doesn't require CTEs or RETURNING syntax, but they
> could in time be supported.

No time like the present. Is there some reason why it would be
difficult with our implementation of CTEs? I can't think why it would
be.

>> But docs imply otherwise -- "source_query -- A query (SELECT statement
>> or VALUES statement) that supplies the rows to be merged into the
>> target_table_name". Either the docs are wrong, or the code is wrong.
>> Hopefully you can just fix the code.
>
> Neither. The docs show that is unsupported. The source query is the
> USING phrase, there is no INSERT SELECT.

My mistake.

>> * Rules are not going to be supported, on the grounds that the
>> behavior is unclear, which I suppose is fine. But what about
>> ruleutils.c support?
>>
>> That seems like entirely another matter to me. What about EXPLAIN,
>> etc? Deparse support seems to be considered generic infrastructure,
>> that doesn't necessarily have much to do with the user-visible rules
>> feature.
>
> The MERGE query is a normal query, so that should all just work.

Look at get_query_def(), for example. That clearly isn't going to work
with MERGE.

> EXPLAIN is specifically regression tested.

You do very little with EXPLAIN right now, though. More importantly, I
think that this is considered a necessary piece of functionality, even
if no core code uses it. There are definitely third-party extensions
that use ruleutils in a fairly broad way.

> I followed the comments of how we handle CHECK constraints.
>
> It's hard to think of a real world example that would use that; your
> example seems strange. Why would you want to use Oids in the WHEN AND
> clause? In the ON or DML clauses, sure, but not there.
>
> This is a non-standard feature, so we can decide whether to support
> that or not. Allowing them is easy, just not very meaningful. Not sure
> if it has consequences.

It's easier to just support them. That way we don't have to think about it.

>> * Wholerow vars are broken:
>>
>> postgres=# merge into testoids a using (select 1 "key", 'foo' "data")
>> b on a.key = b.key when matched then update set data = b.*::text when
>> not matched then insert (key, data) values (1, 'foo');
>> ERROR: XX000: variable not found in subplan target lists
>> LOCATION: fix_join_expr_mutator, setrefs.c:2351
>>
>> That's all I have for now.
>
> Good catch; that one is a valid error. I hadn't tried to either
> support them or block them.
>
> Maybe its in the SQL Standard, not sure. Support for whole row vars
> probably isn't a priority though.

I think that this needs to work, on general principle. Again, just
fixing it is much easier than arguing about it.

--
Peter Geoghegan

In response to

Re: [HACKERS] MERGE SQL Statement for PG11 at 2018-01-29 21:34:45 from Simon Riggs

Responses

Re: [HACKERS] MERGE SQL Statement for PG11 at 2018-01-30 09:45:49 from Simon Riggs

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Tomas Vondra	2018-01-29 22:49:14	Re: JIT compiling with LLVM v9.0
Previous Message	Andres Freund	2018-01-29 22:17:22	Re: JIT compiling with LLVM v9.0