Re: Extend COPY FROM with HEADER <integer> to skip multiple lines

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>, Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extend COPY FROM with HEADER <integer> to skip multiple lines
Date: 2025-06-09 13:28:20
Message-ID: daf30bdc-0735-4a17-a3f2-7e41de2667c3@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2025-06-09 Mo 4:27 AM, Fujii Masao wrote:
>
>
> On 2025/06/09 16:10, Shinya Kato wrote:
>> Hi hackers,
>>
>> I'd like to propose a new feature for the COPY FROM command to allow
>> skipping multiple header lines when loading data. This enhancement
>> would enable files with multi-line headers to be loaded without any
>> preprocessing, which would significantly improve usability.
>>
>> In real-world scenarios, it's common for data files to contain
>> multiple header lines, such as file descriptions or column
>> explanations. Currently, the COPY command cannot load these files
>> directly, which requires users to preprocess them with tools like sed
>> or tail.
>>
>> Although you can use "COPY t FROM PROGRAM 'tail -n +3 /path/to/file'",
>> some environments do not have the tail command available.
>> Additionally, this approach requires superuser privileges or
>> membership in the pg_execute_server_program role.
>>
>> This feature also has precedent in other major RDBMS:
>> - MySQL: LOAD DATA ... IGNORE N LINES [1]
>> - SQL Server: BULK INSERT … WITH (FIRST ROW=N) [2]
>> - Oracle SQL*Loader: sqlldr … SKIP=N [3]
>>
>> I have not yet created a patch, but I am willing to implement an
>> extension for the HEADER option. I would like to discuss the
>> specification first.
>>
>> The specification I have in mind is as follows:
>> - Command: COPY FROM
>> - Formats: text and csv
>> - Option syntax: HEADER [ boolean | integer | MATCH] (Extend the
>> HEADER option to accept an integer value in addition to the existing
>> boolean and MATCH keywords.)
>> - Behavior: Let N be the specified integer.
>>    - If N < 0, raise an error.
>>    - If N = 0 or 1, same behavior when boolean is specified.
>>    - If N > 1, skip the first N rows.
>>
>> Thoughts?
>
> I generally like the idea.
>
> However, a similar proposal was made earlier [1], and seemingly
> some hackers weren't in favor of it. It's probably worth reading
> that thread to understand the previous concerns.
>
> Regards,
>
>
> [1]
> https://postgr.es/m/CALAY4q8nGSXp0P5uf56vn-mD7reWqZP5k6PS1CGUm26X4FsYJA@mail.gmail.com

I think the earlier proposal went rather further than this one, which I
suspect can be implemented fairly cheaply.

I don't have terribly strong feelings about it, but matching a feature
implemented elsewhere has some attraction if it can be done easily.

OTOH I'm a bit curious to know what software produces multi-line CSV
headers.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Xuneng Zhou 2025-06-09 13:29:55 Re: Add new wait event to XactLockTableWait
Previous Message Ryo Kanbayashi 2025-06-09 13:25:26 Re: [PATCH] PGSERVICEFILE as part of a normal connection string