Re: Let file_fdw access COPY FROM PROGRAM

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
Cc: PG <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let file_fdw access COPY FROM PROGRAM
Date: 2016-09-07 03:29:28
Message-ID: CADkLM=fw2AvNKxbMs3yNNC1CN8_5=2MOY3X_sjVjqRdWeXV6qA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Sep 6, 2016 at 9:46 PM, Amit Langote <Langote_Amit_f8(at)lab(dot)ntt(dot)co(dot)jp>
wrote:

> On 2016/09/07 3:12, Corey Huinker wrote:
> > On Fri, Sep 2, 2016 at 5:07 AM, Amit Langote wrote:
> >> I am not familiar with win32 stuff too, so I don't have much to say
> about
> >> that. Maybe someone else can chime in to help with that.
> >
> > The regressions basically *can't* test this because we'd need a shell
> > command we know works on any architecture.
>
> OK.
>

Well...maybe not, depending on what Craig and other can do to educate me
about the TAP tests.

> >
> > Changing table-level options requires superuser privileges, for security
> > reasons: only a superuser should be able to determine which file is read,
> > or which program is run. In principle non-superusers could be allowed to
> > change the other options, but that's not supported at present.
>
> Hmm, just a little modification would make it better for me:
>
> ... for security reasons. For example, only superuser should be able to
> determine which file read or which program is run.
>
> Although that could be just me.
>

In this case "determine" is unclear whether a non-superuser can set the
program to be run, or is capable of knowing which program is set to be run
by the fdw.

We may want some more opinions on what is the most clear.

>
> >> @@ -632,11 +671,18 @@ fileBeginForeignScan(ForeignScanState *node, int
> >> eflags)
> >> * Create CopyState from FDW options. We always acquire all
> columns,
> >> so
> >> * as to match the expected ScanTupleSlot signature.
> >> */
> >> - cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> - filename,
> >> - false,
> >> - NIL,
> >> - options);
> >> + if(filename)
> >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> + filename,
> >> + false,
> >> + NIL,
> >> + options);
> >> + else
> >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> + program,
> >> + true,
> >> + NIL,
> >> + options)
> >>
> >> Like I mentioned above, if there was a is_program Boolean flag instead
> of
> >> separate filename and program, this would be just:
> >>
> >> + cstate = BeginCopyFrom(node->ss.ss_currentRelation,
> >> + filename,
> >> + is_program,
> >> + NIL,
> >> + options);
> >>
> >> That would get rid of if-else blocks here and a couple of other places.
> >
> > It would, pushing the complexity out to the user. Which could be fine,
> but
> > if we do that then "filename" is a misnomer.
>
> This is an internal state so I'm not sure how this would be pushing
> complexity out to the user. Am I perhaps misreading what you said?
>

Indeed, it is internal state. Maybe we rename the variable file_command or
something.

>
> What a user sees is that there are two separate foreign table options -
> filename and program. That we handle them internally using a string to
> identify either file or program and a Boolean flag to show which of the
> two is just an internal implementation detail.
>
> COPY does it that way internally and I just saw that psql's \copy does it
> the same way too. In the latter's case, following is the options struct
> (src/bin/psql/copy.c):
>
> struct copy_options
> {
> char *before_tofrom; /* COPY string before TO/FROM */
> char *after_tofrom; /* COPY string after TO/FROM filename */
> char *file; /* NULL = stdin/stdout */
> bool program; /* is 'file' a program to popen? */
> bool psql_inout; /* true = use psql stdin/stdout */
> bool from; /* true = FROM, false = TO */
> };
>
> But as you said above, this could be deferred to the committer.
>

Yeah, and that made for zero storage savings: a char pointer which is never
assigned a string takes up as much space as a 4-byte-aligned boolean. And
the result is that "file" really means program, which I found slightly
awkward.

> >> diff --git a/contrib/file_fdw/input/file_fdw.source
> >> b/contrib/file_fdw/input/file_fdw.source
> >> index 685561f..eae34a4 100644
> >> --- a/contrib/file_fdw/input/file_fdw.source
> >> +++ b/contrib/file_fdw/input/file_fdw.source
> >>
> >> You forgot to include expected output diffs.
> >
> > Having regression tests for this is extremely problematic, because the
> > program invoked would need to be an invokable command on any architecture
> > supported by postgres. I'm pretty sure no such command exists.
>
> Craig and Michael elsewhere suggested something about adding TAP tests. If
> that helps the situation, maybe you could.
>

Yeah, I need to educate myself about those.

>
> I will mark the CF entry status to "Waiting on author" till you post a new
> patch.
>
>
Thanks.

>
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Corey Huinker 2016-09-07 03:37:55 Re: Let file_fdw access COPY FROM PROGRAM
Previous Message Craig Ringer 2016-09-07 03:24:07 Re: Let file_fdw access COPY FROM PROGRAM