Re: merging some features from plpgsql2 project

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>
Cc: Merlin Moncure <mmoncure(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>, Marko Tiikkaja <marko(at)joh(dot)to>, Joel Jacobson <joel(at)trustly(dot)com>
Subject: Re: merging some features from plpgsql2 project
Date: 2017-01-07 08:06:55
Message-ID: CAFj8pRBydQbQ2eC3BkobShmNam=_7ezMS-4vaQi=G75LNF2uBA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

2017-01-03 17:57 GMT+01:00 Jim Nasby <Jim(dot)Nasby(at)bluetreble(dot)com>:

> On 1/2/17 1:51 PM, Pavel Stehule wrote:
>
>> 1) Neither is enabled by default, so 90% of users have no idea they
>> exist. Obviously that's an easy enough fix, but...
>>
>> We can strongly talk about it - there can be a chapter in plpgsql doc.
>> Now, the patterns and antipatterns are not officially documented.
>>
>
> Or just fix the issue, provide the backwards compatability GUCs and move
> on.
>
> 2) There's no way to incrementally change those values for a single
>> function. If you've set extra_errors = 'all' globally, a single
>> function can't say "turn off the too many rows setting for this
>> function".
>>
>>
>> We can enhance the GUC syntax like "all -too_many_rows,-xxx"
>>
>
> Why create all that framework when we could just have multiple
> plpgsql.blah GUCs? plpgsql.multirow_assign_level=FATAL solves that
> problem. We just need a plpgsql GUC for each backwards compatibility break.
>
> BTW, while I can see value in being able to change these settings
>> for an entire function, I think the recommended use should be to
>> only change them for a specific statement.
>>
>>
>> What you can do in plain assign statement
>>
>> target := expression ?
>>
>
> The point I was trying to make there is if you do have some cases where
> you need to silently ignore extra rows (for example) it's probably only one
> statement and not an entire function. That said, if we just make these
> options GUCs then you can just do SET and RESET.
>
> My border is any compatibility break - and I would not to across it.
>> First issue is probably harder
>>
>
> If we never broke compatibility we'd still be allowing SELECT without
> FROM, NULL = NULL being TRUE, and a whole bunch of other problems. We'd
> also be stuck on protocol v1 (and of course not talking about what we want
> in v4).
>
> We've successfully made incompatible changes that were *far worse* than
> this (ie: renaming pg_stat_activity.procpid). Obviously we shouldn't be
> breaking things willy-nilly, but these are long-standing warts (dare I say
> BUGS?) that should be fixed. They're ugly enough that someone took the time
> to break plpgsql out of the core code and fork it.

The discussion about changing behave of current features has not a
solution. I don't believe so there is possible to find a win/win solution -
it is not possible. I respect the opinion all people here, but somebody
more afraid of language fragmentation, somebody else more from some
possible production issues. All arguments are valid, all arguments has same
value, all arguments are sent by people with lot of experience (but with
different experience - anybody works with different domains, uses different
patters, hits different kind of errors).

This discussion was +/- about behave of INTO clause - and if STRICT clause
should be default and if STRICT clause should be more strict.

if we introduce new pattern (new syntax), that can be strict enough then we
can go forward. New syntax will not have a impact on current customers code
base. If somebody prefer very strict behave, then he can use new syntax
quickly. Issues in old code can be detected by other tools - plpgsql_check,
and extra_warnings, extra_errors.

Current state
==========

INTO
-------
* is not strict in any directions - columns or rows
* not equal target and source column number cannot be detected in plpgsql
(plpgsql_check does it)
* missing rows - uses FOUND (PL/SQL), too much rows - uses GET DIAGNOSTICS
x = ROW_COUNT (ANSI/SQL pattern)
* doesn't need outer subtransaction (handled exception) for handling
specific situations

select a into t from test where a = i;
get diagnostics r = row_count;
if r > 1 then raise exception 'too much rows'; end if;

INTO STRICT
------------------
* is strict in rows - require exactly one row result
* not equal target and source column number cannot be detected in plpgsql
(plpgsql_check does it)
* raises a exceptions no_data_found, too_much_rows
* require outer subtransaction (handled exception) for handling
no_data_found (10% slowdown in worst case)

begin
select a into t from test where a = i;
exception
when no_data_found then
t = null;
end;

There is safe pattern (workaround) for missing columns in source - using
RECORD type. Access to missing field raises runtime error. Another solution
- plpgsql_check.

subselect assignment
-----------------------------
* is column strict - doesn't allow more columns now
* don't allow too_much_rows
* the FOUND is always true - has not change to check it
* although it is ANSI/SQL pattern it is undocumented in PostgreSQL (only
INTO clause is documented)

x := (select a from test where a = i)

officially unsupported implementation side effect assignment FROM
------------------------------------------------------------------------------------------
* is undocumented
* allows only one column
* behave is similar like INTO

x := a from test where a = i

Possible risk
============
* typo SELECT a b INTO a,b or SELECT a,b INTO b,a - currently I use a
RECORD type to be safe
* undetermined result in more rows
* unknown result in no_data_found

The questions - what we expect from new behave?
=======================================
* safety against typo - 100% yes
* safety against undetermined result - 100% yes

* what behave should be default in no_data_found case ? Exception or NULL?
- both has advantages for who require designed behave and disadvantage for
require second behave (a) handling the exception, b) IF NOT FOUND test)

Possible solution (without compatibility break)
===================================

introduction ANSI/SQL assingnment, multiassignment SET
------------------------------------------------------------------------------

SET varname = (SELECT ...)
SET (t1, t2, t3, ...) = (SELECT c1, c2, c3 ..)

- possible identifier collision with GUC (not in multiassignment variant)
- is not too robust against typo SET (t1, t2,..) = (SELECT t2,t1, ...)
- it is ANSI/SQL design/behave

introduction new syntax proposed by plpgsql2 project
-----------------------------------------------------------------------

SELECT t1 := c1, t2 := c2, ...

- it can be PostgreSQL specific syntax - full control over design
- maximally robust against typo
- long syntax, but for short syntax can be used SELECT c1,c2,c3, .. INTO
STRICT recvar
- what should be no_data_found behave?

I have nothing about a cost of "new syntax" implementation - but for me -
it looks like good solution for us - it can be win/win solution. It breaks
nothing - it introduce nice to have typo robust syntax.

Comments, notes?

Regards

Pavel

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrey Borodin 2017-01-07 08:30:04 Re: autonomous transactions
Previous Message amul sul 2017-01-07 06:44:23 Re: pg_background contrib module proposal