Re: Allow COPY's 'text' format to output a header

From: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
To: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
Cc: Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Muller <samullers(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY's 'text' format to output a header
Date: 2018-05-15 16:06:12
Message-ID: 23653.1526400372@sss.pgh.pa.us
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Isaac Morland <isaac(dot)morland(at)gmail(dot)com> writes:
> On 15 May 2018 at 10:26, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:
>> Andrew Dunstan wrote:
>>> I'm not necessarily opposed to this, but I'm not certain about the use
>>> case either.

>> The downside is that it would create the need, when using COPY TO,
>> to know whether an input file was generated with or without header,
>> and a hazard on mistakes.
>> If you say it was and it wasn't, you quietly loose the first row of data.
>> If you say it wasn't and in fact it was, either there's a
>> datatype mismatch or you quietly get a spurious row of data.

> Just to be clear, we're talking about my "header match" feature, not the
> basic idea of allowing a header in text format?

AFAICS, Daniel's just reacting to the basic idea of a header line.
I agree that by itself that's not worth much. However, if we added
your proposed option to insist that the column names match during COPY
IN, I think that that could have some value. It would allow
forestalling one common type of pilot error, ie copying the wrong file
entirely. (It'd also prevent copying in data that has the wrong column
order, but I think that's a less common scenario. I might be wrong
about that.)

> One can imagine extensions of the idea: for example, the header could
> actually be used to identify the columns, so the column order in the file
> doesn't matter. There could also be an "AS" syntax to allow the target
> field names to be different from the field names in the header. I have
> occasionally found myself wanting to ignore certain columns of the file.
> But these are all significantly more complicated than just looking at the
> header and requiring it to match the target field names.

Yeah, and every bit of flexibility you add raises the chance of an
undetected error. COPY isn't intended as a general ETL facility,
so I'd mostly be -1 on adding such things. But I can see the value
of confirming that you're copying the right file, and a header match
check would go a long way towards doing that.

regards, tom lane

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David G. Johnston 2018-05-15 16:12:41 Re: Allow COPY's 'text' format to output a header
Previous Message Tom Lane 2018-05-15 15:58:04 Re: Windows build broken starting at da9b580d89903fee871cf54845ffa2b26bda2e11