| From: | jian he <jian(dot)universality(at)gmail(dot)com> |
|---|---|
| To: | Junwang Zhao <zhjwpku(at)gmail(dot)com> |
| Cc: | Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, Joe Conway <mail(at)joeconway(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Andrew Dunstan <andrew(at)dunslane(dot)net>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org> |
| Subject: | Re: Emitting JSON to file using COPY TO |
| Date: | 2026-02-09 03:48:09 |
| Message-ID: | CACJufxFLpMsbW41T65xJsw925MSE1bvOr6X+h_7sw8_qmDpTpA@mail.gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-general pgsql-hackers |
On Sat, Feb 7, 2026 at 9:27 PM Junwang Zhao <zhjwpku(at)gmail(dot)com> wrote:
>
> Here are some comments on v23:
>
> 0001: The refactor looks straightforward to me. Introducing a format
> field should make future extensions easier. One suggestion is that we
> could add some helper macros around format, for example:
>
> #define IS_FORMAT_CSV(format) (format == COPY_FORMAT_CSV)
> #define IS_FORMAT_TEXT_LIKE(format) \
> (format == COPY_FORMAT_TEXT || format == COPY_FORMAT_CSV)
>
> I think this would improve readability.
Personally, I don't like marcos....
>
> 0002: Since you have moved the `CopyFormat enum` into 0001, the
> following commit msg should be rephrased.
>
> The CopyFormat enum was originally contributed by Joel Jacobson
> joel(at)compiler(dot)org, later refactored by Jian He to address various issues, and
> further adapted by Junwang Zhao to support the newly introduced CopyToRoutine
> struct (commit 2e4127b6d2).
>
> - if (opts_out->format == COPY_FORMAT_BINARY && opts_out->delim)
> - ereport(ERROR,
> - (errcode(ERRCODE_SYNTAX_ERROR),
> - /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
> - errmsg("cannot specify %s in BINARY mode", "DELIMITER")));
> + if (opts_out->delim)
> + {
> + if (opts_out->format == COPY_FORMAT_BINARY)
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + /*- translator: %s is the name of a COPY option, e.g. ON_ERROR */
> + errmsg("cannot specify %s in BINARY mode", "DELIMITER"));
> + else if (opts_out->format == COPY_FORMAT_JSON)
> + ereport(ERROR,
> + errcode(ERRCODE_SYNTAX_ERROR),
> + errmsg("cannot specify %s in JSON mode", "DELIMITER"));
> + }
>
> Can we add a function that converts CopyFormat to a string? Treating
> CopyFormat as %s in error messages would make the code shorter.
> However, I'm not sure whether this aligns with translation
> conventions, correct me if I'm wrong.
>
I don’t think this is worth the added complexity.
That said, I tried to simplify the code and changed it to:
if (opts_out->delim &&
(opts_out->format == COPY_FORMAT_BINARY ||
opts_out->format == COPY_FORMAT_JSON))
ereport(ERROR,
errcode(ERRCODE_SYNTAX_ERROR),
opts_out->format == COPY_FORMAT_BINARY
? errmsg("cannot specify %s in BINARY mode", "DELIMITER")
: errmsg("cannot specify %s in JSON mode", "DELIMITER"));
> - * CSV and text formats share the same TextLike routines except for the
> + * CSV and text, json formats share the same TextLike routines except for the
>
> I'd suggest rewording to `CSV, text and json ...`. The same applied to
> other parts in this patch.
>
sure.
> 0003: The commit message includes some changes(adapt the newly
> introduced CopyToRoutine) that actually belong to 0002; it would be
> better to remove them from this commit.
>
0002 commit message:
"""
This introduces the JSON format option for the COPY TO command, allowing users
to export query results or table data directly as a single JSON object or a
stream of JSON objects.
The JSON format is currently supported only for COPY TO operations; it
is not available for COPY FROM.
JSON format is incompatible with some standard text/CSV parsing or
formatting options,
including:
- HEADER
- DEFAULT
- NULL
- DELIMITER
- FORCE QUOTE / FORCE NOT NULL
Regression tests covering valid JSON exports and error handling for
incompatible options have been added to src/test/regress/sql/copy.sql.
"""
0003 commit message:
"""
Add option force_array for COPY JSON FORMAT
This adds the force_array option, which is available exclusively
when using COPY TO with the JSON format.
When enabled, this option wraps the output in a top-level JSON array
(enclosed in square brackets with comma-separated elements), making the
entire result a valid single JSON value. Without this option, the default
behavior is to output a stream of independent JSON objects.
Attempting to use this option with COPY FROM or with formats other than
JSON will raise an error.
"""
> + if (cstate->json_row_delim_needed && cstate->opts.force_array)
> + CopySendChar(cstate, ',');
> + else if (cstate->opts.force_array)
> + {
> + /* first row needs no delimiter */
> + CopySendChar(cstate, ' ');
> + cstate->json_row_delim_needed = true;
> + }
>
> can we do this:
>
> if (cstate->opts.force_array)
> {
> if (cstate->json_row_delim_needed)
> CopySendChar(cstate, ',');
> else
> {
> /* first row needs no delimiter */
> CopySendChar(cstate, ' ');
> cstate->json_row_delim_needed = true;
> }
> }
>
good suggestion.
If more people think WRAP_ARRAY is better than FORCE_ARRAY, we can
switch to it accordingly.
The change itself is quite straightforward.
| Attachment | Content-Type | Size |
|---|---|---|
| v24-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch | text/x-patch | 13.1 KB |
| v24-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch | text/x-patch | 10.7 KB |
| v24-0002-json-format-for-COPY-TO.patch | text/x-patch | 21.8 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Vitaly Davydov | 2026-02-09 10:42:56 | Re: Support logical replication of DDLs |
| Previous Message | Adrian Klaver | 2026-02-09 01:41:47 | Re: REFRESH MATERIALIZED VIEW CONCURRENTLY is blocking autovacuum on table |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | jian he | 2026-02-09 03:59:46 | Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row |
| Previous Message | Chao Li | 2026-02-09 03:41:16 | Re: log_min_messages per backend type |