Re: Allow COPY's 'text' format to output a header

From: Isaac Morland <isaac(dot)morland(at)gmail(dot)com>
To: Daniel Verite <daniel(at)manitou-mail(dot)org>
Cc: Andrew Dunstan <andrew(dot)dunstan(at)2ndquadrant(dot)com>, Simon Muller <samullers(at)gmail(dot)com>, Michael Paquier <michael(at)paquier(dot)xyz>, David Steele <david(at)pgmasters(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Allow COPY's 'text' format to output a header
Date: 2018-05-15 15:52:54
Message-ID: CAMsGm5f_SMZHwja=a+R1ikdrWn3Nyp5_xaWpx_K3EdV-XkdQAg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 15 May 2018 at 10:26, Daniel Verite <daniel(at)manitou-mail(dot)org> wrote:

> Andrew Dunstan wrote:
>
> > I'm not necessarily opposed to this, but I'm not certain about the use
> > case either.
>
> +1.
> The downside is that it would create the need, when using COPY TO,
> to know whether an input file was generated with or without header,
> and a hazard on mistakes.
> If you say it was and it wasn't, you quietly loose the first row of data.
> If you say it wasn't and in fact it was, either there's a
> datatype mismatch or you quietly get a spurious row of data.
>

Just to be clear, we're talking about my "header match" feature, not the
basic idea of allowing a header in text format?

You already need to know whether or not there is a header, no matter what:
there is no way to avoid needing to know the format of the data to be
imported. And certainly if "header" is an option, one has to know whether
or not to set it in any given situation.

The "header match" helps ensure the file is the right one by requiring the
header contents to match the field names, rather than just being thrown
away.

I don't view it as a way to avoid pre-defining the table. It just increases
the chance that the wrong file won't load but will instead trigger an error
condition immediately.

Note that this advantage includes what happens if you specify header but
the file has no header: as long as you actually specified header match, the
error will be caught unless the first row of actual data happens to match
the field names, which is almost always highly unlikely and frequently
impossible (e.g., a person with firstname "firstname", surname "surname",
birthday "birthday" and so on).

One can imagine extensions of the idea: for example, the header could
actually be used to identify the columns, so the column order in the file
doesn't matter. There could also be an "AS" syntax to allow the target
field names to be different from the field names in the header. I have
occasionally found myself wanting to ignore certain columns of the file.
But these are all significantly more complicated than just looking at the
header and requiring it to match the target field names.

If one specifies no header but there actually is a header in the file, then
loading will fail in many cases but it depends on what the header in the
file looks like. This part is unaffected by my idea.

> This complication should be balanced by some advantage.
> What can we do with the header?
> If you already have the table ready to COPY in, you don't
> need that information. The only reason why COPY TO
> needs to know about the header is to throw it away.
> And if you don't have the table created yet, a header
> with just the column names is hardly sufficient to create it,
> isn't it?
>

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2018-05-15 15:58:04 Re: Windows build broken starting at da9b580d89903fee871cf54845ffa2b26bda2e11
Previous Message Mark Dilger 2018-05-15 15:33:26 Windows build broken starting at da9b580d89903fee871cf54845ffa2b26bda2e11