Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

From: Fujii Masao <masao(dot)fujii(at)gmail(dot)com>
To: Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Date: 2012-11-14 15:09:56
Message-ID: CAHGQGwHer4czY8Jt0STCGJ5GBNGTTSNBwu-7+K=+U6uK8P8GPw@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Nov 14, 2012 at 8:30 PM, Etsuro Fujita
<fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> wrote:
> I wrote:
>> > From: Tom Lane [mailto:tgl(at)sss(dot)pgh(dot)pa(dot)us]
>>
>> > I wrote:
>> > > "Etsuro Fujita" <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp> writes:
>> > >> I have a question. I think it would be also better to extend the syntax
>> > >> for the SQL COPY command in the same way, ie,
>> > >> COPY foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz |' with
>> > >> format 'csv'
>> >
>> > > Yeah, sure --- that case is already superuser-only, so why not give it
>> > > the option of being a popen instead of just fopen.
>> >
>> > BTW, one thought that comes to mind is that such an operation is
>> > extremely likely to fail under environments such as SELinux. That's
>> > not necessarily a reason not to do it, but we should be wary of
>> > promising that it will work everywhere. Probably a documentation note
>> > about this would be enough.
>>
>> OK I'll revise the patch.
>
> I've revised the patch. In this version a user can specify hooks for pre- and
> post-processor executables for COPY and \copy in the follwoing way:
>
> $ echo '/bin/gunzip -c $1' > decompress.sh
> $ chmod +x decompress.sh
>
> In the case of the COPY command,
>
> postgres=# COPY foo FROM '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz'
> WITH (format 'csv');
>
> Also, in the case of the \copy instruction,
>
> postgres=# \copy foo from '/home/pgsql/decompress.sh /home/pgsql/foo.csv.gz'
> with (format 'csv')
>
> As shown in the example above, I've assumed that the syntax for this option for
> e.g., the COPY command is:
>
> COPY table_name FROM 'progname filename' WITH ...
> COPY table_name TO 'progname filename' WITH ...
>
> Here, progname for COPY IN is the user-supplied program that takes filename as
> its argument and that writes on standard output.

What about further extending the COPY IN syntax to the following?

COPY table_name FROM 'progname [ option, ... ]' WITH ...

I'd just like to execute

COPY vmstat_table FROM 'vmstat' WITH ...

> Also, prgoname for COPY OUT is
> the user-supplied program that reads standard input and writes to filename taken
> as its argument. This makes simple the identification and verification of
> progname and filename.
>
> Todo:
> * Documentation including documentation note about the limitation for
> environments such as SELinux mentioned by Tom.
> * More test
>
> Any comments and suggestions are welcomed.

Isn't it dangerous to allow a user to execute external program in
server side via SQL?

Regards,

--
Fujii Masao

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit kapila 2012-11-14 15:11:16 Re: Proposal for Allow postgresql.conf values to be changed via SQL
Previous Message Bruce Momjian 2012-11-14 15:08:15 Re: Further pg_upgrade analysis for many tables