Re: Allow declaration after statement and reformat code to use it

From: Ranier Vilela <ranier(dot)vf(at)gmail(dot)com>
To: Jelte Fennema <Jelte(dot)Fennema(at)microsoft(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow declaration after statement and reformat code to use it
Date: 2021-08-19 13:38:28
Message-ID: CAEudQArYwzzt5NmzAEdQ5ahHyYBz2F_G-vGBdQmW87CC4TGimA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Em qui., 19 de ago. de 2021 às 08:50, Jelte Fennema <
Jelte(dot)Fennema(at)microsoft(dot)com> escreveu:

> ## What is this?
>
> It's a draft patch that replaces code like this:
> ```c
> pg_file_unlink(PG_FUNCTION_ARGS)
> {
> char *filename;
> requireSuperuser();
> filename = convert_and_check_filename(PG_GETARG_TEXT_PP(0));
> ```
>
> With this shorter version:
> ```c
> pg_file_unlink(PG_FUNCTION_ARGS)
> {
> requireSuperuser();
> char *filename = convert_and_check_filename(PG_GETARG_TEXT_PP(0));
> ```
>
> ## Why would we want this?
> 1. It removes 23328 lines of code that don't have any impact on how the
> code behaves [1]. This is roughly 2.7% of all the lines of code in the
> codebase.
> 2. Declarations are closer to the actual usage. This is advised by the
> "Code Complete" book [2] and has the following advantages:
> a. This limits variable scope to what is necessary. Which in turn makes
> the mental model you have to keep of a function when reading the code
> simpler.
> b. In most cases it allows you to see the the type of a variable
> without needing to go to the top of the function.
> 3. You can do input checking and assertions at the top of the function,
> instead of having to put declarations in front of it. This makes it clear
> which invariants hold for the function. (as seen in the example above and
> the changes for pg_file_rename[3])
> 4. Declaring variables after statements is allowed in C99. Postgres
> already requires C99, so it might as well use this feature too.
>
> ## How was this changeset created?
>
> I created a script that modifies all C files in the following way:
>
> 1. Find a non `static` declaration.
> 2. If it has an initializer and it is not a single variable/constant,
> don't consider replacing it. (reason: a function call initializer might
> have sideffects).
> 3. Otherwise (it's a simple initializer or it has no initializer at all),
> take the type and variable from that declaration.
> 4. Find the next use of the variable.
> 5. If the next use is not an assignment, don't do anything (the value of
> the original initialization is used).
> 6. If the next use is an assignment:
> 1. Remove the old declaration
> 2. Prefix the found assignment with the type
> 3. Unless the variable is also used in the same line of the new
> initialization, e.g:
> ```c
> int a = 1;
> a = a + a;
> ```
>
> ## How does this script work?
>
> It uses a Perl regex to search and replace! (obligatory jokes at the
> bottom of the email) This regex is based on the ones used in this PR to
> citus[4] and the similar PR to pg_auto_failover[5]. The script is in the
> 3rd commit of this patchset.
>
> To visualize the regex in the script in a reasonable way, copy paste the
> search part of it:
> ```
> \n\t(?!(return|static)\b)(?P<type>(\w+[\t ])+[\t *]*)(?>(?P<variable>\w+)(
> =
> [\w>\s\n-]*?)?;\n(?P<code_between>(?>(?P<comment_or_string_or_not_preprocessor>\/\*.*?\*\/|"(?>\\"|.)*?"|(?!goto)[^#]))*?)(\t)?(?=\b(?P=variable)\b))(?<=\n\t)(?<!:\n\t)(?P=variable)
> =(?![^;]*?[^>_]\b(?P=variable)\b[^_])
> ```
>
> And paste that into https://www.debuggex.com/, then select PCRE from the
> dropdown. (Sharing seems to be down at this point, so this is the only way
> to do it at the moment) Try it out! The regex is not as crazy as it looks.
>
> There's two important assumptions this regex makes:
> 1. Code is indented using tabs, and the intent level determines the scope.
> (this is true, because of `pgindent`)
> 2. Declared variables are actually used. (this is true, because we would
> get compiler warnings otherwise)
>
> There's two cases where this regex has some special behaviour:
> 1. Stop searching for the first usage of a variable when either a `goto`
> or a preprocessor command is found (outside a string or comment). These
> things affect the control flow in a way that the regex does not understand.
> (any `#` character is assumed to be a preprocessor commands).
> 2. Don't replace if the assignment is right after a `label:`, by checking
> if there was a `:` as the last character on the previous line. This works
> around errors like this:
> ```
> hashovfl.c:865:3: error: a label can only be part of a statement and a
> declaration is not a statement
> OffsetNumber maxroffnum = PageGetMaxOffsetNumber(rpage);
> ^~~~~~~~~~~~
> ```
> Detecting this case in this way is not perfect, because sometimes there is
> an extra newline or a comment between the label and the assignment. This is
> not easily detectable by the regex, because lookbehinds cannot have a
> variable length in Perl (or most regex engines for that matter). For these
> few cases (5 in the entire codebase) a manual change was done either before
> or after the automatic replacement to fix them so the code compiles again.
> (2nd and 5th commits of this patchset)
>
> After all these changes `make -s world` doesn't show any warnings or
> errors and `make check-world` succeeds. I configured compilation like this:
> ```
> ./configure --enable-cassert --enable-depend --enable-debug --with-openssl
> --with-libxml --with-libxslt --with-uuid=e2fs --with-icu
> ```
>
> ## What I want to achieve with this email
>
> 1. For people to look at a small sample of the changes made by this
> script. I will try to send patches with the actual changes made to the code
> as attachments in a followup email. However, I don't know if that will
> succeed since the patch files are very big and it might run into some
> mailserver limits. So in case that doesn't work, the full changeset is
> available on Github[6]. This make viewing this enormous diff quite a bit
> easier IMHO anyaway. If you see something weird that is not covered in the
> "Known issues" section below, please share it. That way it can be discussed
> and/or fixed.
> 2. A discussion on if this type of code change would be a net positive for
> Postgres codebase. Please explain clearly why or why not.
> 3. Some links to resources on what's necessary to get a big refactoring
> patch like this merged.
>
> ## What I don't want to achieve with this email
>
> 1. For someone to go over all the changes right now. There's likely to be
> changes to the script or something else. Doing a full review of the changes
> would be better saved for later during a final review.
>
> ## Known issues with the currently generated code
>
> There's a few issues with the final generated code that I've already
> spotted. These should all be relatively easy to fix in an automated way.
> However, I think doing so is probably better done by `pgindent` or some
> other auto formatting tool, instead of with the regex. Note that I did run
> `pgindent`, it just didn't address these things:
>
> 1. Whitespace between type and variable is kept the same as it was before
> moving the declaration. If the original declaration had whitespace added in
> such a way that all variable names of declarations aligned, then this
> whitespace is preserved. This looks weird in various cases. See [3] for an
> example.
> 2. `pgindent` adds a newline after each block of declarations, even if
> they are not at the start of function. If this is desired is debatable, but
> to me it seems like it adds excessive newlines. See [7] for an example.
> 3. If all declarations are moved away from the top of the function, then
> an empty newline is kept at the top of the function. This seems like an
> unnecessary newline to me. See [3] for an example.
>
> Sources:
> 1. `tokei`[8] results of lines of code:
> Before:
> ```
> $ tokei --type C
>
> ===============================================================================
> Language Files Lines Code Comments
> Blanks
>
> ===============================================================================
> C 1389 1361309 866514 330933
> 163862
>
> ===============================================================================
> Total 1389 1361309 866514 330933
> 163862
>
> ===============================================================================
> ```
> After:
> ```
> $ tokei --type C
>
> ===============================================================================
> Language Files Lines Code Comments
> Blanks
>
> ===============================================================================
> C 1389 1347515 843186 330952
> 173377
>
> ===============================================================================
> Total 1389 1347515 843186 330952
> 173377
>
> ===============================================================================
> ```
> 2.
> https://github.com/mgp/book-notes/blob/master/code-complete.markdown#103-guidelines-for-initializing-variables
> 3.
> ```patch
> @@ -401,11 +389,10 @@ pg_file_rename_internal(text *file1, text *file2,
> text *file3)
> Datum
> pg_file_unlink(PG_FUNCTION_ARGS)
> {
> - char *filename;
>
> requireSuperuser();
>
> - filename = convert_and_check_filename(PG_GETARG_TEXT_PP(0));
> + char *filename = convert_and_check_filename(PG_GETARG_TEXT_PP(0));
>
> if (access(filename, W_OK) < 0)
> ```
> 4. https://github.com/citusdata/citus/pull/3181
> 5. https://github.com/citusdata/pg_auto_failover/pull/266
> 6. https://github.com/JelteF/postgres/pull/2
> 7.
> ```patch
> @@ -2863,7 +2886,8 @@ palloc_btree_page(BtreeCheckState *state,
> BlockNumber blocknum)
> * longer than we must.
> */
> Buffer buffer = ReadBufferExtended(state->rel, MAIN_FORKNUM, blocknum,
> RBM_NORMAL,
> - state->checkstrategy);
> + state->checkstrategy);
> +
> LockBuffer(buffer, BT_READ);
>
> /*
> ```
> 8. https://github.com/XAMPPRocky/tokei
>
>
> Obligatory jokes:
> 1. https://xkcd.com/1171/
> 2. https://xkcd.com/208/
> 3. https://stackoverflow.com/a/1732454/2570866 (and serious response to
> it
> https://neilmadden.blog/2019/02/24/why-you-really-can-parse-html-and-anything-else-with-regular-expressions/
> )
>
-1 in short.

C needs readability, not fewer lines.
Aside from horrible code, it doesn't improve 0.1% on anything.

I think it's a bad idea and I'm strongly against it.

regards,
Ranier Vilela

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Ranier Vilela 2021-08-19 13:52:36 Re: Showing I/O timings spent reading/writing temp buffers in EXPLAIN
Previous Message Masahiko Sawada 2021-08-19 13:09:45 Re: Skipping logical replication transactions on subscriber side