Re: Let file_fdw access COPY FROM PROGRAM

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PG <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let file_fdw access COPY FROM PROGRAM
Date: 2016-09-06 18:12:59
Message-ID: CADkLM=dx9Q5difswB7AMRcmfk3AaDd9bVYRxA1iAOiy3qF5+gA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

>
> I am not familiar with win32 stuff too, so I don't have much to say about
> that. Maybe someone else can chime in to help with that.
>

The regressions basically *can't* test this because we'd need a shell
command we know works on any architecture.

>
> About the patch:
>
> * applies cleanly, compiles fine
> * basic functionality seems to work (have not done any extensive tests
> though)
>
>
> - Specifies the file to be read. Required. Must be an absolute path
> name.
> + Specifies the file to be read. Must be an absolute path name.
> + Either <literal>filename</literal> or <literal>program</literal>
> must be
> + specified. They are mutually exclusive.
> + </para>
> + </listitem>
> + </varlistentry>
>
> The note about filename and program being mutually exclusive could be
> placed at the end of list of options. Or perhaps mention this note as
> part of the description of program option, because filename is already
> defined by that point.
>
> + <listitem>
> + <para>
> + Specifies the command to executed.
>
> s/to executed/to be executed/g
>

Correct. I will fix that when other issues below are resolved.

>
> + Either <literal>program</literal> or <literal>filename</literal>
> must be
> + specified. They are mutually exclusive.
> </para>
>
> Oh I see this has already been mentioned in program option description.
> Did you intend to specify the same in both filename and program
> descriptions?
>

It's been a while since I repackaged Adam's code, but generally I'm in
favor of some redundancy if the two mutually exclusive things are
documented far enough apart.

>
> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
> typedef struct FileFdwExecutionState
> {
> char *filename; /* file to read */
> - List *options; /* merged COPY options, excluding
> filename */
> - CopyState cstate; /* state of reading file */
> + char *program; /* program to read output from */
> + List *options; /* merged COPY options, excluding
> filename and program */
> + CopyState cstate; /* state of reading file or program */
>
> Have you considered a Boolean flag is_program instead of char **program
> similar to how copy.c does it? (See a related comment further below)
>

Considered it yes, but decided against it when I started to write my
version. When Adam delivered his version of the patch, I noticed he had
made the same design decision. Only one of them will be initialized, and
the boolean will byte align to 4 bytes, so it's the same storage allocated
either way.

Either we're fine with two variables, or we think file_name is poorly
named. I have only a slight preference for the two variables, and defer to
the group for a preference.

>
> - * This is because the filename is one of those options, and we don't
> want
> - * non-superusers to be able to determine which file gets read.
> + * This is because the filename or program string are two of those
> + * options, and we don't want non-superusers to be able to determine
> which
> + * file gets read or what command is run.
>
> I'm no native English speaker, but I wonder if this should read: filename
> or program string *is* one of those options ... OR filename *and* program
> are two of those options ... ? Also, "are two of those options" sounds a
> bit odd to me because I don't see that used as often as "one of those
> whatever". I guess that's quite a bit of nitpicking about a C comment, ;)
>

Given that C comments constitute a large portion of our documentation, I
fully support making them as clear as possible.

I don't remember if this is Adam's comment or mine. Adam - if you're out
there, chime in.

The original paragraph was:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read.
In principle non-superusers could be allowed to change the other options,
but that's not supported at present.

How does this paragraph sound?:

Changing table-level options requires superuser privileges, for security
reasons: only a superuser should be able to determine which file is read,
or which program is run. In principle non-superusers could be allowed to
change the other options, but that's not supported at present.

> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
> ereport(ERROR,
> (errcode(ERRCODE_SYNTAX_ERROR),
> errmsg("conflicting or redundant options")));
> + if (program)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
> filename = defGetString(def);
> }
>
> + else if (strcmp(def->defname, "program") == 0)
> + {
> + if (filename)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
> + if (program)
> + ereport(ERROR,
> + (errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("conflicting or redundant options")));
> + program = defGetString(def);
> + }
>
> Why does it check for filename here? Also the other way around above.
>

We don't want them to specify both program and filename, nor do we allow 2
programs, or 2 filenames.

> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
> eflags)
> * Create CopyState from FDW options. We always acquire all columns,
> so
> * as to match the expected ScanTupleSlot signature.
> */
> - cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> - filename,
> - false,
> - NIL,
> - options);
> + if(filename)
> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> + filename,
> + false,
> + NIL,
> + options);
> + else
> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> + program,
> + true,
> + NIL,
> + options)
>
> Like I mentioned above, if there was a is_program Boolean flag instead of
> separate filename and program, this would be just:
>
> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> + filename,
> + is_program,
> + NIL,
> + options);
>
> That would get rid of if-else blocks here and a couple of other places.
>

It would, pushing the complexity out to the user. Which could be fine, but
if we do that then "filename" is a misnomer.

>
> diff --git a/contrib/file_fdw/input/file_fdw.source
> b/contrib/file_fdw/input/file_fdw.source
> index 685561f..eae34a4 100644
> --- a/contrib/file_fdw/input/file_fdw.source
> +++ b/contrib/file_fdw/input/file_fdw.source
>
> You forgot to include expected output diffs.
>

Having regression tests for this is extremely problematic, because the
program invoked would need to be an invokable command on any architecture
supported by postgres. I'm pretty sure no such command exists.

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Eisentraut 2016-09-06 18:14:29 Re: Logical Replication WIP
Previous Message Tom Lane 2016-09-06 18:11:49 Re: Vacuum: allow usage of more than 1GB of work mem