Re: run pgindent on a regular basis / scripted manner

From: Magnus Hagander <magnus(at)hagander(dot)net>
To: Noah Misch <noah(at)leadboat(dot)com>
Cc: Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Andres Freund <andres(at)anarazel(dot)de>, Bruce Momjian <bruce(at)momjian(dot)us>, Jelte Fennema <postgres(at)jeltef(dot)nl>, Peter Geoghegan <pg(at)bowt(dot)ie>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Stephen Frost <sfrost(at)snowman(dot)net>, Jesse Zhang <sbjesse(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: run pgindent on a regular basis / scripted manner
Date: 2023-04-23 22:01:24
Message-ID: CABUevEwO7bGu_4QHcs3QuVxpa53HDpntQcACAT6rzDv9XTZcgg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Sat, Apr 22, 2023 at 9:59 PM Noah Misch <noah(at)leadboat(dot)com> wrote:
>
> (Given that another commentator is "absolutely against" a hook, this message
> is mostly for readers considering this for other projects.)
>
> On Sat, Apr 22, 2023 at 03:23:59PM +0200, Magnus Hagander wrote:
> > On Tue, Feb 7, 2023 at 5:43 AM Noah Misch <noah(at)leadboat(dot)com> wrote:
> > > On Mon, Feb 06, 2023 at 06:17:02PM +0100, Peter Eisentraut wrote:
> > > > Also, pgindent takes tens of seconds to run, so hooking that into the git
> > > > push process would slow this down quite a bit.
> > >
> > > The pre-receive hook would do a full pgindent when you change typedefs.list.
> > > Otherwise, it would reindent only the files being changed. The average push
> > > need not take tens of seconds.
> >
> > It would probably ont be tens of seconds, but it would be slow. It
> > would need to do a clean git checkout into an isolated environment and
> > spawn in there, and just that takes time.
>
> That would be slow, but I wouldn't do it that way. I'd make "pg_bsd_ident
> --pre-receive --show-diff" that, instead of reading from the filesystem, gets
> the bytes to check from the equivalent of this Perl-like pseudocode:
>
> while (<>) {
> my($old_hash, $new_hash, $ref) = split;
> foreach my $filename (split /\n/, `git diff --name-only $old_hash..$new_hash`) {
> $file_content = `git show $new_hash $filename`;
> }
> }
>
> I just ran pgindent on the file name lists of the last 1000 commits, and
> runtime was less than 0.5s for each of 998/1000 commits. There's more a real
> implementation might handle:
>
> - pg_bsd_indent changes
> - typedefs.list changes
> - skip if the break-glass "pgindent: no" appears in a commit message
> - commits changing so many files that a clean "git checkout" would be faster

Wouldn't there also be the case of a header file change that could
potentially invalidate a whole lot of C files?

There's also the whole potential problem of isolations. We need to run
the whole thing in an isolated environment (because any way in at this
stage could lead to an exploit if a committer key is compromised at
any point). And at least in the second case, it might not have access
to view that data yet because it's not in... Could probably be worked
around, but not trivially so.

(But as mentioned above, I think the conclusion is we don't want this
enforced in a receive hook anyway)

> > And it would have to also
> > know to rebuild pg_bsd_indent on demand, which would require a full
> > ./configure run (or meson equivalent). etc.
> >
> > So while it might not be tens of seconds, it most definitely won't be fast.
>
> A project more concerned about elapsed time than detecting all defects might
> even choose to take no synchronous action for pg_bsd_indent and typedefs.list
> changes. When a commit changes either of those, the probability that the
> committer already ran pgindent rises substantially.

True, but it's far from 100% -- and if you got something in that
didn't work, then the *next* committer would have to clean it up....

--
Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Michael Paquier 2023-04-24 00:57:10 Re: buffer refcount leak in foreign batch insert code
Previous Message Jim Jones 2023-04-23 21:56:39 Re: xmlserialize bug - extra empty row at the end