| From: | Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> |
|---|---|
| To: | Daniel Gustafsson <daniel(at)yesql(dot)se> |
| Cc: | PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> |
| Subject: | Re: COPY JSON: use trailing commas in FORCE_ARRAY output |
| Date: | 2026-05-06 09:18:38 |
| Message-ID: | F08C3C8E-9769-4A96-B392-DA93E11AC3A5@gmail.com |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
> On May 6, 2026, at 16:23, Daniel Gustafsson <daniel(at)yesql(dot)se> wrote:
>
>> On 6 May 2026, at 08:40, Chao Li <li(dot)evan(dot)chao(at)gmail(dot)com> wrote:
>
>> I was surprised by the comma placement. It is valid JSON, but it looks quite uncommon.
>
> It might look uncommon, but for very wide lines it's IMHO preferrable to not
> have to scroll all the way to the end of the line to know that the line is part
> of an array.
>
>> For comparison, the existing json_agg() places commas at the end of the line:
>
> That's true, but json_agg() and COPY TO in ndjson format have different use
> cases.
>
>> ..it should not have any performance impact.
>
> It does add branches though, and in one branch use a non-inlined function where
> previously it would unconditionally use an inline function. ISTM it would
> still be valuable to do performance testing given that COPY is commonly used in
> performance sensitive settings.
>
Make sense. I just did a test to compare the performance between master and the patch:
For the data setup, since the patch only changes where the comma is emitted, I intentionally used a table with only one column, to minimize the cost of formatting each row:
```
DROP TABLE IF EXISTS copy_json_force_array_perf;
CREATE UNLOGGED TABLE copy_json_force_array_perf(id int);
INSERT INTO copy_json_force_array_perf
SELECT g
FROM generate_series(1, 10000000) AS g;
VACUUM ANALYZE copy_json_force_array_perf;
\timing on
```
On master:
```
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1208.694 ms (00:01.209)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1200.203 ms (00:01.200)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1238.639 ms (00:01.239)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1211.344 ms (00:01.211)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1252.197 ms (00:01.252)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1223.510 ms (00:01.224)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1212.378 ms (00:01.212)
```
Average: ~1221 ms
With the patch:
```
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1218.580 ms (00:01.219)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1212.913 ms (00:01.213)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1204.350 ms (00:01.204)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1205.276 ms (00:01.205)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1202.088 ms (00:01.202)
evantest=# COPY copy_json_force_array_perf TO '/dev/null' WITH (FORMAT json, FORCE_ARRAY);
COPY 10000000
Time: 1222.390 ms (00:01.222)
```
Average: ~1211 ms
The difference doesn't look quite meaningful from this test. I built with debug and asserts disabled, and compiled with -O2. The output was written to /dev/null to avoid client/network overhead. The tests ran on my MacBook M4.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Kirill Reshke | 2026-05-06 09:20:15 | Re: support create index on virtual generated column. |
| Previous Message | Zhongpu Chen | 2026-05-06 09:15:18 | Re: Proposal: tighten validation for legacy EUC encodings or document that accepted byte sequences may be unconvertible to UTF8 |