Re:Reject HEADER with binary and json COPY formats by option presence

From: "Tingchuan Sun" <suntingchuan1996(at)163(dot)com>
To: "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com>
Cc: PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>, "Fujii Masao" <masao(dot)fujii(at)gmail(dot)com>, "Shinya Kato" <shinya11(dot)kato(at)gmail(dot)com>, "Andrew Dunstan" <andrew(at)dunslane(dot)net>
Subject: Re:Reject HEADER with binary and json COPY formats by option presence
Date: 2026-06-03 03:16:49
Message-ID: 31ac5168.30dc.19e8b7bd6d1.Coremail.suntingchuan1996@163.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

At 2026-06-03 11:11:29, "Chao Li" <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>Hi,
>
>While testing “file_fdw: Support multi-line HEADER option”, I noticed a small issue.
>
>The doc says that the HEADER option cannot be used with the “binary" or “json" format:
>```
> <varlistentry id="sql-copy-params-header">
> <term><literal>HEADER</literal></term>
> <listitem>
> <para>
> On output, if this option is set to <literal>true</literal>
> (or an equivalent Boolean value), the first line of the output will
> contain the column names from the table.
> Integer values <literal>0</literal> and <literal>1</literal> are
> accepted as Boolean values, but other integers are not allowed for
> <command>COPY TO</command> commands.
> </para>
> <para>
> On input, if this option is set to <literal>true</literal>
> (or an equivalent Boolean value), the first line of the input is
> discarded. If set to a non-negative integer, that number of
> lines are discarded. If set to <literal>MATCH</literal>, the first line
> is discarded, and it must contain column names that exactly match the
> table's columns, in both number and order; otherwise, an error is raised.
> The <literal>MATCH</literal> value is only valid for
> <command>COPY FROM</command> commands.
> </para>
> <para>
> This option is not allowed when using <literal>binary</literal> or <literal>json</literal> format.
> </para>
> </listitem>
> </varlistentry>
>```
>
>However, when I specified "header ‘0", the command did not fail. That means the current behavior depends on the value of the “header" option, not on presence:
>```
>evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '1');
>ERROR: cannot specify HEADER in BINARY mode
>evantest=# create foreign table ft (i int) server fs options (format 'binary', filename '/tmp/ft.bin', header '0');
>CREATE FOREIGN TABLE
>```
>
>As we can see, "header 1" fails, but header 0" is silently accepted. I don't think this behavior matches what the documentation describes.
>
>For comparison, VACUUM has a similar option. “BUFFER_USAGE_LIMIT" is not allowed with "VACUUM FULL", and a value of 0 means disabling the buffer access strategy:
>```
> <varlistentry>
> <term><literal>BUFFER_USAGE_LIMIT</literal></term>
> <listitem>
> <para>
> Specifies the
> <glossterm linkend="glossary-buffer-access-strategy">Buffer Access Strategy</glossterm>
> ring buffer size for <command>VACUUM</command>. This size is used to
> calculate the number of shared buffers which will be reused as part of
> this strategy. <literal>0</literal> disables use of a
> <literal>Buffer Access Strategy</literal>. If <option>ANALYZE</option>
> is also specified, the <option>BUFFER_USAGE_LIMIT</option> value is used
> for both the vacuum and analyze stages. This option can't be used with
> the <option>FULL</option> option except if <option>ANALYZE</option> is
> also specified. When this option is not specified,
> <command>VACUUM</command> uses the value from
> <xref linkend="guc-vacuum-buffer-usage-limit"/>. Higher settings can
> allow <command>VACUUM</command> to run more quickly, but having too
> large a setting may cause too many other useful pages to be evicted from
> shared buffers. The minimum value is <literal>128 kB</literal> and the
> maximum value is <literal>16 GB</literal>.
> </para>
> </listitem>
> </varlistentry>
>```
>
>Using BUFFER_USAGE_LIMIT 0 with FULL is still rejected:
>```
>evantest=# vacuum (full, BUFFER_USAGE_LIMIT 0) t;
>ERROR: BUFFER_USAGE_LIMIT cannot be specified for VACUUM FULL
>```
>
>So VACUUM rejects BUFFER_USAGE_LIMIT based on the presence of the option, not its value. I think we should keep the behavior consistent here, and VACUUM's behavior better matches the documentation. Otherwise, I am afraid this could encourage more inconsistencies in the future.
>
>The fix is straightforward. Since we already have the “header_specified" variable to indicate whether the option is present, we can check “header_specified" instead.
>
>I reported a similar issue for the COPY command earlier in thread [1]. If this patch is accepted, then that one may be worth considering as well.
>
>[1] https://www.postgresql.org/message-id/C1D2509E-E5D1-46B0-932C-B57AA7B963A1%40gmail.com
>
>--
>Chao Li (Evan)
>HighGo Software Co., Ltd.
>https://www.highgo.com/
>
>
>

>
+1 for using presence over value.

The patch looks good to me.
Regards,
Tingchuan Sun

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-06-03 03:21:49 Re: BM_IO_ERROR flag is lost in TerminateBufferIO due to order of operations in UnlockBufHdrExt
Previous Message wenhui qiu 2026-06-03 02:32:07 Re: Report oldest xmin source when autovacuum cannot remove tuples