Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Christoph Berg <christoph(dot)berg(at)credativ(dot)de>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Michael Paquier <michael(dot)paquier(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Reproducible builds: genbki.pl and Gen_fmgrtab.pl
Date: 2017-12-21 02:50:37
Message-ID: 7236.1513824637@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Christoph Berg <christoph(dot)berg(at)credativ(dot)de> writes:
> Re: Tom Lane 2017-12-16 <417(dot)1513438031(at)sss(dot)pgh(dot)pa(dot)us>
>> I think we're talking at cross-purposes. I'm not saying we should not fix
>> this problem. I'm saying that the proposed fix appears incomplete ...

> Grepping through the source, there are three places where $0 printed
> to files in regular operation (as opposed to being used in --help):

I poked around and found a few more.

> I believe the reason why we've only been seeing half of the problem
> yet is that the generated files are shipped with the tarballs, so it
> might be a timestamping issue determining if the scripts are
> re-executed.

Right; some parts of this problem would only materialize for you if you
needed to rebuild the generated files that are included in the tarball,
which should basically not be happening in normal packager builds.
Rather the risk is at our end: if we ever switched the tarball creation
process to be a VPATH build, then there'd be path dependencies in the
created tarballs. That would be bad.

More generally, my concern here is not just that we fix this problem
but that it stays fixed. If some individual scripts print $0 into
their output and it happens to not affect any built distribution files
today, it's still bad, because tomorrow somebody might copy that coding
pattern into someplace else where it matters more. I think we need a
project policy that thou shalt not print $0 into generated files, period.

Also, experimenting with a VPATH build, I verified that such "helpful"
practices as printing $infile or @ARGV into the output file will also
create path dependencies. So I think we need to lose those too.
It's not like they're adding any info you can't find out from the
Makefiles.

On the other hand, there is something we can do that will improve
matters: rather than just printing the base name of the script,
let's print its full relative path within the PG sources, eg instead
of Gen_fmgrtab.pl let's print src/backend/utils/Gen_fmgrtab.pl.
My thought here is that if you're not already intimately familiar
with a script you might not remember where it lives, the more so
if you're looking at a file that's been put into an installation
tree far away from where it was generated. I see that this policy
was already followed in some places, just not in the ones that were
using the $0 shortcut.

In short, I propose the attached more-extensive patch.

Some of the files generated by these scripts, particularly the map
files generated by the src/backend/utils/mb/Unicode/ scripts, are
not just present in tarballs but are actually in our git repo.
So changing those scripts won't affect anything until/unless someone
updates the repo's generated files, which I've not done here and
don't feel much need to do. I just want to establish a principle
that we don't print path-dependent info into generated files.

regards, tom lane

Attachment Content-Type Size
reproducible-headers-3.patch text/x-diff 8.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Stephen Frost 2017-12-21 03:01:50 Re: [HACKERS] replace GrantObjectType with ObjectType
Previous Message Alvaro Herrera 2017-12-21 02:46:27 Re: [HACKERS] REINDEX CONCURRENTLY 2.0