Re: Emitting JSON to file using COPY TO

From: jian he <jian(dot)universality(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Joe Conway <mail(at)joeconway(dot)com>, Junwang Zhao <zhjwpku(at)gmail(dot)com>, Florents Tselai <florents(dot)tselai(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Dean Rasheed <dean(dot)a(dot)rasheed(at)gmail(dot)com>, Daniel Verite <daniel(at)manitou-mail(dot)org>, Davin Shearer <davin(at)apache(dot)org>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: Emitting JSON to file using COPY TO
Date: 2026-03-08 16:16:08
Message-ID: CACJufxFFZqxC3p4WjpTEi4riaJm=pADX+py0yQ0=RWTn5cqK3Q@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-general pgsql-hackers

hi.

V27-0002 is still not bullet-proof.

drop table if exists t1;
create table t1(a int);
insert into t1 values (1);
copy (select * from t1) to stdout json;
{"a":1}
WARNING: resource was not closed: TupleDesc 0x7171d0ca3440 (18239,-1)

Also see ExecAssignScanProjectionInfo->ExecConditionalAssignProjectionInfo
So in v28-0002, I changed to
+ /*
+ * composite_to_json() requires a stable TupleDesc. Since the slot's
+ * descriptor (slot->tts_tupleDescriptor) can change during the execution
+ * of a SELECT query, we use cstate->queryDesc->tupDesc instead. This
+ * precaution is only necessary when the output slot's TupleDesc is of
+ * type RECORDOID.
+ */
+ if (!cstate->rel && slot->tts_tupleDescriptor->tdtypeid == RECORDOID)
+ slot->tts_tupleDescriptor = cstate->queryDesc->tupDesc;

+ cstate->json_projvalues = (Datum *) palloc(natts * sizeof(Datum));
+ cstate->json_projnulls = (bool *) palloc(natts * sizeof(bool));
I changed it to
+ cstate->json_projvalues = palloc_array(Datum, natts);
+ cstate->json_projnulls = palloc_array(bool, natts);

+ rowdata = HeapTupleHeaderGetDatum(tup->t_data);
I changed it to
+ rowdata = HeapTupleGetDatum(tup);

Patch v28-0004 adds the json_projvalues and json_projnulls pointers to struct
CopyToStateData. I wondered if adding these would slow the COPY TO with TEXT and
CSV format, so I ran a quick test using a 36-column table.

Surprisingly, v28 actually make COPY TO with TEXT and CSV performs a little bit
faster. But I didn't find out why.
You may also try the attached test script: copyto_json_perfomance_test.nocfbot.

--
jian
https://www.enterprisedb.com/

Attachment Content-Type Size
v28-0002-json-format-for-COPY-TO.patch text/x-patch 23.7 KB
v28-0001-introduce-CopyFormat-refactor-CopyFormatOptions.patch text/x-patch 13.1 KB
v28-0003-Add-option-force_array-for-COPY-JSON-FORMAT.patch text/x-patch 12.3 KB
v28-0004-COPY-TO-JSON-support-column-lists.patch text/x-patch 12.7 KB
copyto_json_perfomance_test.nocfbot application/octet-stream 1.7 KB

In response to

Responses

Browse pgsql-general by date

  From Date Subject
Next Message Shaheed Haque 2026-03-08 16:19:22 Re: Unexpected deadlock across two separate rows, using Postgres 17 and Django's select_for_update()
Previous Message felix.quintgz 2026-03-08 15:15:34 Re: Unexpected deadlock across two separate rows, using Postgres 17 and Django's select_for_update()

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2026-03-08 16:39:47 Re: Reduce timing overhead of EXPLAIN ANALYZE using rdtsc?
Previous Message Alexandre Felipe 2026-03-08 16:09:07 Addressing buffer private reference count scalability issue