Re: Add header support to text format and matching feature

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Julien Rouhaud <rjuju123(at)gmail(dot)com>, Peter Eisentraut <peter(dot)eisentraut(at)enterprisedb(dot)com>
Cc: Greg Stark <stark(at)mit(dot)edu>, Rémi Lapeyre <remi(dot)lapeyre(at)lenstra(dot)fr>, Zhihong Yu <zyu(at)yugabyte(dot)com>, David Steele <david(at)pgmasters(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, vignesh C <vignesh21(at)gmail(dot)com>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, Daniel Verite <daniel(at)manitou-mail(dot)org>
Subject: Re: Add header support to text format and matching feature
Date: 2022-06-12 13:36:13
Message-ID: d35e05ac-fde1-0dba-39bf-b30e75082b48@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2022-06-07 Tu 11:47, Julien Rouhaud wrote:
> Hi,
>
> On Wed, Mar 30, 2022 at 09:11:09AM +0200, Peter Eisentraut wrote:
>> Committed, after some further refinements as discussed.
> While working on nearby code, I found some problems with this feature.
>
> First, probably nitpicking, the HEADER MATCH is allowed for COPY TO, is that
> expected? The documentation isn't really explicit about it, but there's
> nothing to match when exporting data it's a bit surprising. I'm not opposed to
> have HEADER MATCH means HEADER ON for COPY TO, as as-is one can easily reuse
> the commands history, but maybe it should be clearly documented?

I think it makes more sense to have a sanity check to prevent HEADER
MATCH with COPY TO.

>
> Then, apparently HEADER MATCH doesn't let you do sanity checks against a custom
> column list. This one looks like a clear oversight, as something like that
> should be entirely valid IMHO:
>
> CREATE TABLE tbl(col1 int, col2 int);
> COPY tbl (col2, col1) TO '/path/to/file' WITH (HEADER MATCH);
> COPY tbl (col2, col1) FROM '/path/to/file' WITH (HEADER MATCH);
>
> but right now it errors out with:
>
> ERROR: column name mismatch in header line field 1: got "col1", expected "col2"
>
> Note that the error message is bogus if you specify attributes in a
> different order from the relation, as the code is mixing access to the tuple
> desc and access to the raw fields with the same offset.
>
> This also means that it will actually fail to detect a mismatch in the provided
> column list and let you import data in the wrong position as long as the
> datatypes are compatible and the column header in the file are in the correct
> order. For instance:
>
> CREATE TABLE abc (a text, b text, c text);
> INSERT INTO abc SELECT 'a', 'b', 'c';
> COPY abc TO '/path/to/file' WITH (HEADER MATCH);
>
> You can then import the data with any of those:
> COPY abc(c, b, a) TO '/path/to/file' WITH (HEADER MATCH);
> COPY abc(c, a, b) TO '/path/to/file' WITH (HEADER MATCH);
> [...]
> SELECT * FROM abc;
>
> Even worse, if you try to do a COPY ... FROM ... WITH (HEADER ON) on a table
> that has some dropped attribute(s). The current code will access random memory
> as there's no exact attnum / raw field mapping anymore.

Ouch! That certainly needs to be fixed.

>
> I can work on a fix if needed (with some additional regression test to cover
> those cases), but I'm still not sure that having a user provided column list is
> supposed to be accepted or not for the HEADER MATCH. In the meantime I will
> add an open item.
>
>

I think it should, but a temporary alternative would be to forbid HEADER
MATCH with explicit column lists until we can make it work right.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jim Popovitch 2022-06-12 14:05:59 connection failures on forked processes
Previous Message Aleksander Alekseev 2022-06-12 10:42:13 Quick question regarding HeapTupleHeaderData.t_ctid