Re: Let file_fdw access COPY FROM PROGRAM

From: Corey Huinker <corey(dot)huinker(at)gmail(dot)com>
To: Craig Ringer <craig(at)2ndquadrant(dot)com>
Cc: PG <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Let file_fdw access COPY FROM PROGRAM
Date: 2016-06-03 07:37:02
Message-ID: CADkLM=eW701r3JMDf3XeznY44XK8aa+hvaVCV74zpK3BGKn7Gg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jun 3, 2016 at 1:06 AM, Craig Ringer <craig(at)2ndquadrant(dot)com> wrote:

> On 3 June 2016 at 04:48, Corey Huinker <corey(dot)huinker(at)gmail(dot)com> wrote:
>
>> A while back, there was a push to make COPY gzip-aware. That didn't
>> happen, but COPY FROM PROGRAM did, and it scratches the same itch.
>>
>
>
>> - writing unwanted columns to a temp/work table via COPY, and then
>> immediately re-reading them
>>
>
> Without wanting to take away from the value of letting file FDW access
> FROM PROGRAM, I think this really merits a solution that applies to COPY as
> well. Variants on "how do I COPY just some columns from a CSV" is a real
> FAQ, and it doesn't seem like it'd be excessively hard to support. We'd
> just need some way to pass a list of column-ordinals or header-names.
>
> Not asking you to do that work, just pointing out that this particular
> issue applies to COPY its self as well.
>

I agree, they are two different but slightly overlapping issues. file_fdw
needs a way to handle compressed/filtered files, and COPY needs the ability
to skip columns. But right now COPY is all about getting the shape of the
input from the shape of the destination table.

I had the bright idea of creating a custom datatype, call it "skip" /
"filler" / "nada" / "devnull" or something like that, that would map any
and all values to NULL, thus allowing COPY to naively "store" the unwanted
values into nothingness.

That idea falls apart when it hits InputFunctionCall() in fmgr.c, which
prevents any text string from coercing into NULL. I didn't think I was up
to the task of making InputFunctionCall respect a special case type (void?)
or somehow promoting the pseudo type void into a legit column type (unknown
side-effects there). I did create a type that coerced all input into the
empty string, but the disk usage of that was still pretty significant.

So maybe if there was a subspecies of COPY that returned SETOF RECORD:

SELECT important_column1, important_column2
FROM copy_srf( program := 'zcat filename', format := 'csv', header := true
) AS t(bogus_col1 text, important_column1 integer, important_column2 date,
bogus_col2 text, ... );

That would allow COPY to keep it's current efficiency and simplicity, while
(hopefully) avoiding the unwanted data from ever hitting disk. It would
also be vaguely reminiscent of SQL*Loader.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Kyotaro HORIGUCHI 2016-06-03 08:30:55 Re: Parallel pg_dump's error reporting doesn't work worth squat
Previous Message Albe Laurenz 2016-06-03 07:17:58 Re: Prepared statements and generic plans