Re: Portal->commandTag as an enum

From: John Naylor <john(dot)naylor(at)2ndquadrant(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Portal->commandTag as an enum
Date: 2020-02-19 11:31:20
Message-ID: CACPNZCs1McaYxMrBXC39rMBfFerQYyaDFHL1-H=8JFUnRE7Rhw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Mark,

On Wed, Feb 19, 2020 at 10:40 AM Mark Dilger
<mark(dot)dilger(at)enterprisedb(dot)com> wrote:
> This would only make sense to me if the string held in $_ had already been checked for safety, but Catalog.pm does very little to verify that the string is safe to eval. The assumption here, so far as I can infer, is that we don’t embed anything dangerous in our .dat files, so this should be ok. That may be true for the moment, but I can imagine a day when we start embedding perl functions as quoted text inside a data file, or shell commands as text which look enough like perl for eval() to be able to execute them. Developers who edit these .dat files and mess up the quoting, and then rerun ‘make’ to get the new .c and .h files generated, may not like the side effects. Perhaps I’m being overly paranoid….

The use case for that seems slim. However, at a brief glance your
module seems more robust in other ways.

> Rather than add more code generation logic based on the design of Catalog.pm, I wrote a perl based data file parser that parses .dat files and returns vivified perl data, as Catalog.pm does, but with much stricter parsing logic to make certain nothing dangerous gets eval()ed. I put the new module in DataFile.pm.
> [...]
> The new parser is more flexible about the structure of the data, which seems good to me for making it easier to add or modify data files in the future. The new parser does not yet have a means of hacking up the data to add autogenerated fields and such that Catalog.pm does, but I think a more clean break between parsing and autovivifying fields would be good anyway.

Separation of concerns sounds like a good idea, but I've not fully
thought it through. For one advantage, I think it might be nicer to
have indexing.dat and toasting.dat instead of having to dig the info
out of C macros. This was evident while recently experimenting with
generating catcache control data.

As for the patch, I have not done a full review, but I have some
comments based on a light read-through:

utils/Makefile:

+# location of commandtag.dat
+headerdir = $(top_srcdir)/src/include/utils

This variable name is too generic for what the comment says it is. A
better abstraction, if we want one, would be the full path to the
commandtag input file. The other script invocations in this Makefile
don't do it this way, but that's a separate patch.

+# location to write generated headers
+sourcedir = $(top_srcdir)/src/backend/utils

Calling the output the source is bound to confuse people. The comment
implies all generated headers, not just the ones introduced by the
patch. I would just output to the current directory (i.e. have an
--output option with a default empty string). Also, if we want to
output somewhere else, I would imagine it'd be under the top builddir,
not srcdir.

+$(PERL) -I $(top_srcdir)/src/include/utils $<
--headerdir=$(headerdir) --sourcedir=$(sourcedir)
--inputfile=$(headerdir)/commandtag.dat

1. headerdir is entirely unused by the script
2. We can default to working dir for the output as mentioned above
3. -I $(top_srcdir)/src/include/utils is supposed to point to the dir
containing DataFile.pm, but since gencommandtag.pl has "use lib..."
it's probably not needed here. I don't recall why we keep the "-I"
elsewhere. (ditto in Solution.pm)

I'm thinking it would look something like this:

+$(PERL) $< --inputfile=$(top_srcdir)/src/include/utils/commandtag.dat

--
utils/misc/Makefile

+all: distprep
+
# Note: guc-file.c is not deleted by 'make clean',
# since we want to ship it in distribution tarballs.
clean:
@rm -f lex.yy.c
+
+maintainer-clean: clean

Seems non-functional.

--
DataFiles.pm

I haven't studied this in detail, but I'll suggest that if this meant
to have wider application, maybe it should live in src/tools ?

I'm not familiar with using different IO routines depending on the OS
-- what's the benefit of that?

--
gencommandtag.pl

slurp_without_comments() is unused.

sanity_check_data() seems longer than the main body of the script
(minus header boilerplate), and I wonder if we can pare it down some.
For one, I have difficulty imagining anyone would accidentally type an
unprintable or non-ascii character in a command tag and somehow not
realize it. For another, duplicating checks that were done earlier
seems like a maintenance headache.

dataerror() is defined near the top, but other functions are defined
at the bottom.

+# Generate all output internally before outputting anything, to avoid
+# partially overwriting generated files under error conditions

My personal preference is, having this as a design requirement
sacrifices readability for unclear gain, especially since a "chunk"
also includes things like header boilerplate. That said, the script is
also short enough that it doesn't make a huge difference either way.
Speaking of boilerplate, it's better for readability to separate that
from actual code such as:

typedef enum CommandTag
{
#define FIRST_COMMANDTAG COMMANDTAG_$sorted[0]->{taglabel})

--
tcop/dest.c

+ * We no longer display LastOid, but to preserve the wire protocol,
+ * we write InvalidOid where the LastOid used to be written. For
+ * efficiency in the snprintf(), hard-code InvalidOid as zero.

Hmm, is hard-coding zero going to make any difference here?

--
John Naylor https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Fujii Masao 2020-02-19 11:50:48 Re: small improvement of the elapsed time for truncating heap in vacuum
Previous Message Etsuro Fujita 2020-02-19 10:41:09 Re: Updating row and width estimates in postgres_fdw