From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Extend COPY FROM with HEADER <integer> to skip multiple lines |
Date: | 2025-06-09 13:28:20 |
Message-ID: | daf30bdc-0735-4a17-a3f2-7e41de2667c3@dunslane.net |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On 2025-06-09 Mo 4:27 AM, Fujii Masao wrote:
>
>
> On 2025/06/09 16:10, Shinya Kato wrote:
>> Hi hackers,
>>
>> I'd like to propose a new feature for the COPY FROM command to allow
>> skipping multiple header lines when loading data. This enhancement
>> would enable files with multi-line headers to be loaded without any
>> preprocessing, which would significantly improve usability.
>>
>> In real-world scenarios, it's common for data files to contain
>> multiple header lines, such as file descriptions or column
>> explanations. Currently, the COPY command cannot load these files
>> directly, which requires users to preprocess them with tools like sed
>> or tail.
>>
>> Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
>> some environments do not have the tail command available.
>> Additionally, this approach requires superuser privileges or
>> membership in the pg_execute_server_program role.
>>
>> This feature also has precedent in other major RDBMS:
>> - MySQL: LOAD DATA ... IGNORE N LINES [1]
>> - SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
>> - Oracle SQL*Loader: sqlldr … SKIP=N [3]
>>
>> I have not yet created a patch, but I am willing to implement an
>> extension for the HEADER option. I would like to discuss the
>> specification first.
>>
>> The specification I have in mind is as follows:
>> - Command: COPY FROM
>> - Formats: text and csv
>> - Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
>> HEADER option to accept an integer value in addition to the existing
>> boolean and MATCH keywords.)
>> - Behavior: Let N be the specified integer.
>> - If N < 0, raise an error.
>> - If N = 0 or 1, same behavior when boolean is specified.
>> - If N > 1, skip the first N rows.
>>
>> Thoughts?
>
> I generally like the idea.
>
> However, a similar proposal was made earlier [1], and seemingly
> some hackers weren't in favor of it. It's probably worth reading
> that thread to understand the previous concerns.
>
> Regards,
>
>
> [1]
> https://postgr.es/m/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com
I think the earlier proposal went rather further than this one, which I
suspect can be implemented fairly cheaply.
I don't have terribly strong feelings about it, but matching a feature
implemented elsewhere has some attraction if it can be done easily.
OTOH I'm a bit curious to know what software produces multi-line CSV
headers.
cheers
andrew
--
Andrew Dunstan
EDB: https://www.enterprisedb.com
From | Date | Subject | |
---|---|---|---|
Next Message | Xuneng Zhou | 2025-06-09 13:29:55 | Re: Add new wait event to XactLockTableWait |
Previous Message | Ryo Kanbayashi | 2025-06-09 13:25:26 | Re: [PATCH] PGSERVICEFILE as part of a normal connection string |