Re: Let file_fdw access COPY FROM PROGRAM

From: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
To: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
Cc: PG <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let file_fdw access COPY FROM PROGRAM
Date: 2016-09-07 01:46:06
Message-ID: 6fc03b16-291a-61e2-cf1d-13aa360b4555@lab.ntt.co.jp
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2016/09/07 3:12, Corey Huinker wrote:
> On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
>> I am not familiar with win32 stuff too, so I don't have much to say about
>> that. Maybe someone else can chime in to help with that.
>
> The regressions basically *can't* test this because we'd need a shell
> command we know works on any architecture.

OK.

>> @@ -97,8 +99,9 @@ typedef struct FileFdwPlanState
>> typedef struct FileFdwExecutionState
>> {
>> char *filename; /* file to read */
>> - List *options; /* merged COPY options, excluding
>> filename */
>> - CopyState cstate; /* state of reading file */
>> + char *program; /* program to read output from */
>> + List *options; /* merged COPY options, excluding
>> filename and program */
>> + CopyState cstate; /* state of reading file or program */
>>
>> Have you considered a Boolean flag is_program instead of char **program
>> similar to how copy.c does it? (See a related comment further below)
>
> Considered it yes, but decided against it when I started to write my
> version. When Adam delivered his version of the patch, I noticed he had
> made the same design decision. Only one of them will be initialized, and
> the boolean will byte align to 4 bytes, so it's the same storage allocated
> either way.
>
> Either we're fine with two variables, or we think file_name is poorly
> named. I have only a slight preference for the two variables, and defer to
> the group for a preference.

OK, let's defer it to the committer.

>> - * This is because the filename is one of those options, and we don't
>> want
>> - * non-superusers to be able to determine which file gets read.
>> + * This is because the filename or program string are two of those
>> + * options, and we don't want non-superusers to be able to determine
>> which
>> + * file gets read or what command is run.
>>
>> I'm no native English speaker, but I wonder if this should read: filename
>> or program string *is* one of those options ... OR filename *and* program
>> are two of those options ... ? Also, "are two of those options" sounds a
>> bit odd to me because I don't see that used as often as "one of those
>> whatever". I guess that's quite a bit of nitpicking about a C comment, ;)
>
> Given that C comments constitute a large portion of our documentation, I
> fully support making them as clear as possible.
>
> I don't remember if this is Adam's comment or mine. Adam - if you're out
> there, chime in.
>
> The original paragraph was:
>
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read.
> In principle non-superusers could be allowed to change the other options,
> but that's not supported at present.
>
>
> How does this paragraph sound?:
>
> Changing table-level options requires superuser privileges, for security
> reasons: only a superuser should be able to determine which file is read,
> or which program is run. In principle non-superusers could be allowed to
> change the other options, but that's not supported at present.

Hmm, just a little modification would make it better for me:

... for security reasons. For example, only superuser should be able to
determine which file read or which program is run.

Although that could be just me.

>> @@ -257,9 +264,26 @@ file_fdw_validator(PG_FUNCTION_ARGS)
>> ereport(ERROR,
>> (errcode(ERRCODE_SYNTAX_ERROR),
>> errmsg("conflicting or redundant options")));
>> + if (program)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>> filename = defGetString(def);
>> }
>>
>> + else if (strcmp(def->defname, "program") == 0)
>> + {
>> + if (filename)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>> + if (program)
>> + ereport(ERROR,
>> + (errcode(ERRCODE_SYNTAX_ERROR),
>> + errmsg("conflicting or redundant options")));
>> + program = defGetString(def);
>> + }
>>
>> Why does it check for filename here? Also the other way around above.
>
> We don't want them to specify both program and filename, nor do we allow 2
> programs, or 2 filenames.

Ah, I forgot about the mutually exclusive options part.

>> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
>> eflags)
>> * Create CopyState from FDW options. We always acquire all columns,
>> so
>> * as to match the expected ScanTupleSlot signature.
>> */
>> - cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> - filename,
>> - false,
>> - NIL,
>> - options);
>> + if(filename)
>> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> + filename,
>> + false,
>> + NIL,
>> + options);
>> + else
>> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> + program,
>> + true,
>> + NIL,
>> + options)
>>
>> Like I mentioned above, if there was a is_program Boolean flag instead of
>> separate filename and program, this would be just:
>>
>> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
>> + filename,
>> + is_program,
>> + NIL,
>> + options);
>>
>> That would get rid of if-else blocks here and a couple of other places.
>
> It would, pushing the complexity out to the user. Which could be fine, but
> if we do that then "filename" is a misnomer.

This is an internal state so I'm not sure how this would be pushing
complexity out to the user. Am I perhaps misreading what you said?

What a user sees is that there are two separate foreign table options -
filename and program. That we handle them internally using a string to
identify either file or program and a Boolean flag to show which of the
two is just an internal implementation detail.

COPY does it that way internally and I just saw that psql's \copy does it
the same way too. In the latter's case, following is the options struct
(src/bin/psql/copy.c):

struct copy_options
{
char *before_tofrom; /* COPY string before TO/FROM */
char *after_tofrom; /* COPY string after TO/FROM filename */
char *file; /* NULL = stdin/stdout */
bool program; /* is 'file' a program to popen? */
bool psql_inout; /* true = use psql stdin/stdout */
bool from; /* true = FROM, false = TO */
};

But as you said above, this could be deferred to the committer.

>> diff --git a/contrib/file_fdw/input/file_fdw.source
>> b/contrib/file_fdw/input/file_fdw.source
>> index 685561f..eae34a4 100644
>> --- a/contrib/file_fdw/input/file_fdw.source
>> +++ b/contrib/file_fdw/input/file_fdw.source
>>
>> You forgot to include expected output diffs.
>
> Having regression tests for this is extremely problematic, because the
> program invoked would need to be an invokable command on any architecture
> supported by postgres. I'm pretty sure no such command exists.

Craig and Michael elsewhere suggested something about adding TAP tests. If
that helps the situation, maybe you could.

I will mark the CF entry status to "Waiting on author" till you post a new
patch.

Thanks,
Amit

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vinayak 2016-09-07 01:54:46 Re: Transactions involving multiple postgres foreign servers
Previous Message Peter Geoghegan 2016-09-07 01:38:19 Re: ICU integration