Re: plpgsql_check_function - rebase for 9.3

From: Pavel Stehule <pavel(dot)stehule(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: Josh Berkus <josh(at)agliodbs(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: plpgsql_check_function - rebase for 9.3
Date: 2013-03-26 19:19:31
Message-ID: CAFj8pRC6+Ri_HfkiWX=NF9cW3nU3O1ktiPkwnVp+WmbCOn7qJA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello all

2013/3/26 Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>:
> Josh Berkus <josh(at)agliodbs(dot)com> writes:
>> Where are we with this patch? I'm a bit unclear from the discussion on
>> whether it passes muster or not. Things seem to have petered out.
>
> I took another look at this patch tonight. I think the thing that is
> bothering everybody (including Pavel) is that as submitted, pl_check.c
> involves a huge amount of duplication of knowledge and code from
> pl_exec.c, and to a lesser extent pl_comp.c. It certainly looks like a
> maintenance disaster in the making. It doesn't bother me so much that
> pl_check.c knows about each syntactic structure in plpgsql: there are
> already four or five places you have to touch when adding new syntax.
> Rather, it's that there's so much duplication of knowledge about
> semantic details, which is largely expressed by copied-and-pasted code
> from pl_exec.c. It seems like a safe bet that we'll sometimes miss the
> need to fix pl_check.c when we fix something in pl_exec.c. There are
> annoying duplications from pl_comp.c as well, eg knowledge of exactly
> which magic variables are defined in trigger functions.
>
> Having said all that, it's not clear that we can really do any better.
> The only obvious alternative is pushing support for a checking mode
> directly into pl_exec.c, which would obfuscate and slow down that code
> to an unacceptable degree if Pavel's results at
> http://www.postgresql.org/message-id/CAFj8pRAKuJmVjPjzfSryE7+uB8jF8Wtz5rkxK-0ykXme-k81kA@mail.gmail.com
> are any indication. (In that message, Pavel proposes shoveling the
> problem into the compile code instead, but that seems unlikely to work
> IMO: the main problem in pl_check.c as it stands is duplication of code
> from pl_exec.c not pl_comp.c. So I think that path could only lead to
> duplicating the same code into pl_comp.c.)
>
> So question 1 is do we want to accept that this is the implementation
> pathway we're going to settle for, or are we going to hold out for a
> better one? I'd be the first in line to hold out if I had a clue how
> to move towards a better implementation, but I have none. Are we
> prepared to reject this type of feature entirely because of the
> code-duplication problem? That doesn't seem attractive either.

I wrote lot of versions and proposed code is redundant, but most
simple and clean.

I am really against to pushing check to pl_exec, because it
significantly decrease code readability and increase some bottle neck
in CPU extensive tests. More, there are too less place for some future
feature - performance advising, more verbose error messages, etc

In PL/pgPSM I used a little bit different architecture - necessary for
PSM and maybe better for PL/pgSQL too. It is two stage compiler -
parsing to AST, and AST compilation. It simplify gram.y and processing
order depends on AST deep iteration and not on bizon rules. It can
have a impact on speed of very large procedures probably - I don't see
different disadvantages. With this architecture I was able do lot of
controls to compile stage without problems.

Most complexity in current code is related to detecting record types
from expressions without expression evaluation. Maybe this code can be
in core or pl_compile.c. Code for Describe (F) message is not too
reusable. It is necessary for

DECLARE r RECORD;
FOR r IN SELECT ...
LOOP
RAISE NOTICE '>>%<<', r.xx;
END LOOP;

>
> But, even granting that this implementation approach is acceptable,
> the patch does not seem close to being committable as-is: there's
> a lot of fit-and-finish work yet to be done. To make my point I will
> just quote from one of the regression test additions:
>
> create or replace function f1()
> returns void as $$
> declare a1 int; a2 int;
> begin
> select 10,20 into a1;
> end;
> $$ language plpgsql;
> -- raise warning
> select plpgsql_check_function('f1()');
> plpgsql_check_function
> -------------------------------------------------------------------------
> warning:00000:4:SQL statement:too many attributies for target variables
> Detail: There are less target variables than output columns in query.
> Hint: Check target variables in SELECT INTO statement
> (3 rows)
>
> Do we like this output format? I don't. The unlabeled, undocumented
> fields crammed into a single line with colon separators are neither
> readable nor useful. If we actually need these fields, why aren't we
> splitting the output into multiple columns? (I'm also wondering why
> the patch bothers with an option to emit this same info in XML. Surely
> there is vanishingly small use-case for mechanical examination of this
> output.)

This format can be reduced, redesigned, changed. It is designed like
gcc output and optimized for using from psql console.

I tested table output - in original CHECK statement implementation,
but it is not too friendly for showing in monitor - it is just too
wide. There are similar arguments like using tabular output for
EXPLAIN, although there are higher complexity and nested structures
(in EXPLAIN). Personally, I can live with tabular output, it is not
user friendly, but it can be used for machine processing simply, and
any advanced user can write some transform functions to any output
format.

>
> This example also shows that the user-visible messages have had neither
> editing from a native speaker of English, nor even attention from a
> spell checker. I don't fault Pavel for that (his English is far better
> than my Czech) but this patch has been passed by at least one reviewer
> who is a native speaker. Personally I'd also say that neither the
> Detail nor Hint messages in this example are worth the electrons they
> take up, as they add nothing useful to the basic error message. I'd be
> embarrassed to be presenting output like this to end users of Postgres.
>
> (The code comments are in even worse shape than the user-visible
> messages, by the by, where they exist at all.)
>
> A somewhat related point is that it doesn't look like any thought
> has been taken about localized message output.
>
> This stuff is certainly all fixable if we agree on the basic
> implementation approach; and I can't really fault Pavel for not
> worrying much about such details while the implementation approach
> hasn't been agreed to. But I'd judge that there's a week or more's
> worth of work here to get to a patch I'd be happy about committing,
> even without any change in the basic approach. That's not time I'm
> willing to put in personally right now.

Yes. A details should be precised after final decision. I am hope so
this code doesn't contains significant bug - I have not any reported
issue related to this functionality - it is based on plpgsql_lint -
and this code is used in production by some very large companies. Of
course, there are not too large community - it should be manually
compiled.

Regards

Pavel

>
> regards, tom lane

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Josh Berkus 2013-03-26 19:19:34 Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)
Previous Message Tom Lane 2013-03-26 19:10:33 Re: sql_drop Event Triggerg