Re: Extend COPY FROM with HEADER <integer> to skip multiple lines

From: Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
To: Shinya Kato <shinya11(dot)kato(at)gmail(dot)com>, Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp>
Cc: Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Extend COPY FROM with HEADER <integer> to skip multiple lines
Date: 2025-07-02 07:48:15
Message-ID: 1a6f670f-7f04-4f4b-a166-cc3c952d056f@oss.nttdata.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 2025/07/02 14:39, Shinya Kato wrote:
> On Mon, Jun 30, 2025 at 4:24 PM Yugo Nagata <nagata(at)sraoss(dot)co(dot)jp> wrote:
>>
>>
>>>> I have a few minor comment on the patch.
>>>>
>>>> + if (ival < 0)
>>>> + ereport(ERROR,
>>>> + (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>>> + errmsg("%s requires a Boolean value, a non-negative "
>>>> + "integer, or the string \"match\"",
>>>> + def->defname)));
>>>>
>>>> ereport(ERROR,
>>>> (errcode(ERRCODE_SYNTAX_ERROR),
>>>> - errmsg("%s requires a Boolean value or \"match\"",
>>>> + errmsg("%s requires a Boolean value, a non-negative integer, "
>>>> + "or the string \"match\"",
>>>> def->defname)));
>>>>
>>>> These two pieces of code raise the same error, but with different error codes:
>>>> one returns ERRCODE_INVALID_PARAMETER_VALUE, while the other returns ERRCODE_SYNTAX_ERROR.
>>>>
>>>> I believe the former is more appropriate, although the existing code uses the
>>>> latter. In any case, the error codes should be consistent.
>>>
>>> I'm not sure there's an actual rule like "the error code must match
>>> if the error message is the same." But if using the same message with
>>> a different error code is confusing, I'm fine with changing
>>> the earlier message. For example:
>>>
>>> (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>>> - errmsg("%s requires a Boolean value, a non-negative "
>>> - "integer, or the string \"match\"",
>>> + errmsg("a negative integer value cannot be "
>>> + "specified for %s",
>>> def->defname)));
>>
>> Fair point. There might not be any explicit rule.
>> However, I feel it somewhat confusing.
>>
>> After looking into the code, it seems that ERRCODE_SYNTAX_ERROR tends to be used
>> for invalid keywords or parameter types, while ERRCODE_INVALID_PARAMETER_VALUE
>> is typically used for values that are out of the acceptable range.
>>
>> So, the proposed change seems reasonable to me.
>
> Thank you for the review. The change looks good to me, too. A new
> patch is attached.

Thanks for updating the patch!

>
>> Regarding the documentation, how about explicitly stating that when MATCH is specified, only
>> the first line is skipped? While this may seem obvious, it’s worth clarifying, as the semantics
>> of the HEADER option have become a bit more complex with this change.
>
> Agreed. I have updated the documentation as follows:
>
> + lines are discarded. If the option is set to <literal>MATCH</literal>,
> + the number and names of the columns in the header line must exactly
> + match those of the table and, in order, after which the header line is
> + discarded; otherwise an error is raised. The <literal>MATCH</literal>

How about making the wording a bit clearer? For example:

If set to MATCH, the first line is discarded, and it must contain column names that
exactly match the table's columns, in both number and order; otherwise, an error is raised.

Also, the phrase "if this option is set to..." is repeated three times in the current text.
For the second and third instances, we could simplify it to just "if set to...".

Regards,

--
Fujii Masao
NTT DATA Japan Corporation

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Daniel Gustafsson 2025-07-02 07:49:05 Re: Explicitly enable meson features in CI
Previous Message Andrei Lepikhov 2025-07-02 07:32:43 Re: Reduce "Var IS [NOT] NULL" quals during constant folding