Re: Extend COPY FROM with HEADER <integer> to skip multiple lines

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extend COPY FROM with HEADER <integer> to skip multiple lines
Date: 2025-06-26 07:36:18
Message-ID: 23fccda0-fd64-44bf-ad57-16a804996a68@oss.nttdata.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025/06/26 14:35, Shinya Kato wrote:
>
> > > So it seems better for you to implement the patch at first and then
> > > check the performance overhead etc if necessary.
> >
> > Thank you for your advice. I will create a patch.
>
> I created a patch.

Thanks for the patch!

> As you can see from the patch, I believe the performance impact is negligible. The only changes were to modify how the HEADER option is stored and to add a loop that skips the specified number of header lines when parsing the options.
>
> The design is such that if a HEADER value larger than the number of lines in the file is specified, the command will complete with zero rows loaded and will not return an error. This is the same behavior as specifying HEADER true for a CSV file that has zero rows.

Sounds good to me.

> And I will add this patch for the next CF.
>
> Thoughts?

When I compiled with the patch applied, I got the following warning:

copyfromparse.c:834:20: error: ‘done’ may be used uninitialized [-Werror=maybe-uninitialized]
834 | if (done)
| ^

+ lines are discarded. An integer value greater than 1 is only valid for
+ <command>COPY FROM</command> commands.

This might be slightly misleading since 0 is also a valid value for COPY FROM.

+ for (int i = 0; i < cstate->opts.header.skip_lines; i++)
+ {
+ cstate->cur_lineno++;
+ done = CopyReadLine(cstate, is_csv);
+ }

If "done" is true, shouldn't we break out of the loop immediately?
Otherwise, with a large HEADER value, we could end up wasting a lot of
cycles unnecessarily.

+defGetCopyHeaderOption(DefElem *def, bool is_from)
{
+ CopyHeaderOption option;

To avoid repeating the same code like "option.match = false" in every case,
how about initializing the struct like "option = {false, 1}"?

+ (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+ errmsg("HEADER must be non-negative integer")));

This message could be confusing since HEADER can also accept a Boolean or "match".
Maybe it's better to use the same message as "%s requires a Boolean value, integer value,
or \"match\"",? "integer value"? If so, it's also better to use "non-negative integer"
instead of just "integer".

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2025-06-26 08:14:57 Re: PG18 protocol version
Previous Message Jim Jones 2025-06-26 07:22:11 Re: display hot standby state in psql prompt