Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Peter Eisentraut <peter_e(at)gmx(dot)net>, Alvaro Herrera <alvherre(at)2ndquadrant(dot)com>, Simon Riggs <simon(at)2ndquadrant(dot)com>, Fujii Masao <masao(dot)fujii(at)gmail(dot)com>, Etsuro Fujita <fujita(dot)etsuro(at)lab(dot)ntt(dot)co(dot)jp>, Craig Ringer <ringerc(at)ringerc(dot)id(dot)au>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY
Date: 2012-11-14 19:37:52
Message-ID: 23999.1352921872@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Andrew Dunstan <andrew(at)dunslane(dot)net> writes:
> On 11/14/2012 02:05 PM, Peter Eisentraut wrote:
>> Why don't you filter the data before it gets to stdin? Some program is
>> feeding the data to "stdin" on the client side. Why doesn't that do the
>> filtering? I don't see a large advantage in having the data be sent
>> unfiltered to the server and having the server do the filtering.

> Centralization of processing would be one obvious reason.

If I understand correctly, what you're imagining is that the client
sources data to a COPY FROM STDIN type of command, then the backend
pipes that out to stdin of some filtering program, which it then reads
the stdout of to get the data it processes and stores.

We could in principle make that work, but there are some pretty serious
implementation problems: popen doesn't do this so we'd have to cons up
our own fork and pipe setup code, and we would have to write a bunch of
asynchronous processing logic to account for the possibility that the
filter program doesn't return data in similar-size chunk to what it
reads. (IOW, it will never be clear when to try to read data from the
filter and when to try to write data to it.)

I think it's way too complicated for the amount of functionality you'd
get. As Peter says, there's no strong reason not to do such processing
on the client side. In fact there are pretty strong reasons to prefer
to do it there, like not needing database superuser privilege to invoke
the filter program.

What I'm imagining is a very very simple addition to COPY that just
allows it to execute popen() instead of fopen() to read or write the
data source/sink. What you suggest would require hundreds of lines and
create many opportunities for new bugs.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2012-11-14 19:49:31 Re: Further pg_upgrade analysis for many tables
Previous Message Andrew Dunstan 2012-11-14 19:22:11 Re: WIP patch: add (PRE|POST)PROCESSOR options to COPY