From: | Shinya Kato <shinya11(dot)kato(at)gmail(dot)com> |
---|---|
To: | Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com> |
Cc: | Dagfinn Ilmari Mannsåker <ilmari(at)ilmari(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Extend COPY FROM with HEADER <integer> to skip multiple lines |
Date: | 2025-06-27 02:55:18 |
Message-ID: | CAOzEurQhbN5Y8u-4Ert1UVBiLSyc1+Q-9AqQs032ZhfQyf-0SQ@mail.gmail.com |
Views: | Whole Thread | Raw Message | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Fri, Jun 27, 2025 at 12:03 AM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com>
wrote:
>
>
> On 2025/06/26 19:12, Shinya Kato wrote:
> > On Thu, Jun 26, 2025 at 4:36 PM Fujii Masao <masao(dot)fujii(at)oss(dot)nttdata(dot)com
> <mailto:masao(dot)fujii(at)oss(dot)nttdata(dot)com>> wrote:
> >
> > On 2025/06/26 14:35, Shinya Kato wrote:
> > >
> > > > > So it seems better for you to implement the patch at first
> and then
> > > > > check the performance overhead etc if necessary.
> > > >
> > > > Thank you for your advice. I will create a patch.
> > >
> > > I created a patch.
> >
> > Thanks for the patch!
> >
> >
> > Thank you for your review.
> >
> > When I compiled with the patch applied, I got the following warning:
> >
> > copyfromparse.c:834:20: error: ‘done’ may be used uninitialized
> [-Werror=maybe-uninitialized]
> > 834 | if (done)
> > | ^
> >
> >
> > Oh, sorry. I missed it.
> > I fixed it to initialize done = false.
> >
> > + lines are discarded. An integer value greater than 1 is only
> valid for
> > + <command>COPY FROM</command> commands.
> >
> >
> > This might be slightly misleading since 0 is also a valid value for
> COPY FROM.
> >
> > That's certainly true. I fixed it below:
> > + lines are discarded. An integer value greater than 1 is not
> allowed for
> > + <command>COPY TO</command> commands.
>
> Thanks for fixing that! However, it's odd that the description for COPY TO
> appears
> under the section starting with "On input." Shouldn't it be under the "On
> output"
> section instead?
>
> Also, I think the entire HEADER option section could use a clearer
> structure.
> How about revising it like this?
>
> ---------------------
> <listitem>
> <para>
> - Specifies that the file contains a header line with the names of
> each
> - column in the file. On output, the first line contains the column
> - names from the table. On input, the first line is discarded when
> this
> - option is set to <literal>true</literal> (or equivalent Boolean
> value).
> - If this option is set to <literal>MATCH</literal>, the number and
> names
> - of the columns in the header line must match the actual column
> names of
> - the table, in order; otherwise an error is raised.
> + On output, if this option is set to <literal>true</literal>
> + (or an equivalent Boolean value), the first line of the output will
> + contain the column names from the table.
> + Integer values <literal>0</literal> and <literal>1</literal> are
> + accepted as Boolean values, but other integers are not allowed for
> + <command>COPY TO</command> commands.
> + </para>
> + <para>
> + On input, if this option is set to <literal>true</literal>
> + (or an equivalent Boolean value), the first line of the input is
> + discarded. If it is set to a non-negative integer, that number of
> + lines are discarded. If the option is set to
> <literal>MATCH</literal>,
> + the number and names of the columns in the header line must exactly
> + match those of the table, in order; otherwise an error is raised.
> + The <literal>MATCH</literal> value is only valid for
> + <command>COPY FROM</command> commands.
> + </para>
> + <para>
> This option is not allowed when using <literal>binary</literal>
> format.
> - The <literal>MATCH</literal> option is only valid for <command>COPY
> - FROM</command> commands.
> </para>
> </listitem>
> ---------------------
>
Yes, your documentation is clearer than mine.
>
> Also, similar to the existing "boolean" type explanation in the COPY docs,
> we should add a corresponding entry for integer, now that it's accepted by
> the HEADER option. The VACUUM docs has a good example of how to phrase
> this.
>
Exactly.
> > Original error message "%s requires a Boolean value, integer value, or
> \"match\"" should also be fixed to "non-negative integer"?
>
> Yes, I think that the message
>
> "%s requires a Boolean value, a non-negative integer, or the string
> \"match\""
>
> is clearer and more accurate.
Thank you, you're right.
> -typedef enum CopyHeaderChoice
> +typedef struct CopyHeaderOption
> {
> - COPY_HEADER_FALSE = 0,
> - COPY_HEADER_TRUE,
> - COPY_HEADER_MATCH,
> -} CopyHeaderChoice;
> + bool match; /* header line must match actual names? */
> + int skip_lines; /* number of lines to skip before
> data */
> +} CopyHeaderOption;
> <snip>
> - CopyHeaderChoice header_line; /* header line? */
> + CopyHeaderOption header; /* header line? */
>
> Wouldn't it be simpler to just use an integer instead of introducing
> CopyHeaderOption? We could use 0 for false, 1 for true, n > 1 for skipping
> lines,
> and -1 for MATCH in that integer.
>
> This would simplify the logic in defGetCopyHeaderOption().
> I've attached a patch that shows this idea in action. Thought?
>
I thought about the approach, and had a minor concern about the following
inconsistency between the option and its internal implementation:
- When the option is set to "MATCH", header_line becomes -1.
- When the option is set to -1, it's an error.
However, your patch is clear, and it seems we don't need to worry about
this concern.
Your patch looks good to me.
--
Best regards,
Shinya Kato
NTT OSS Center
From | Date | Subject | |
---|---|---|---|
Next Message | shveta malik | 2025-06-27 03:20:03 | Re: Logical Replication of sequences |
Previous Message | Zhijie Hou (Fujitsu) | 2025-06-27 02:28:10 | RE: Conflict detection for update_deleted in logical replication |