Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON

From: Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>
Cc: pgsql-hackers(at)postgresql(dot)org, andrew(at)dunslane(dot)net
Subject: Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON
Date: 2026-04-20 14:34:54
Message-ID: CAJTYsWU24n==-H_agoUguFxZjFV6jCzNamuJzj4ZgiFPUp2bRg@mail.gmail.com
Views: Whole Thread | Raw Message | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:

> Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> writes:
> > COPY TO FORMAT JSON silently accepts the ENCODING option but doesn't
> > perform encoding conversion(?) CopyToJsonOneRow() sends the output of
> > composite_to_json() via CopySendData() without calling
> > pg_server_to_any(), unlike the text and CSV paths.
>
> > COPY t TO '/tmp/out.json' WITH (FORMAT json, ENCODING 'LATIN1');
>
> > On a UTF-8 server this produces UTF-8 output, not LATIN1.
>
> Seems to me the correct thing here is to make it work like the other
> cases, ie perform pg_server_to_any(). I have exactly no sympathy for
> the argument about the RFC saying it must be UTF-8, not least because
> that's not in fact what is implemented (what if the server encoding
> isn't UTF-8?).
>

Agreed. I initially thought rejecting the option was the safer route
given the RFC, but as you pointed out, we aren't enforcing
UTF-8 strictly on the server side anyway.

> Rejecting this option altogether doesn't improve anything, not
> functionally, not specs-compliance-wise, nor according to the
> principle of least surprise.
>

Makes sense. Implementing the conversion properly
keeps JSON format consistent with how the text and CSV formats behave.

> > The attached patch rejects the explicit ENCODING option for JSON
> > mode, consistent with how DELIMITER, NULL, DEFAULT, and HEADER are
> > already rejected. The implicit client_encoding case is a separate
> > design question (should COPY TO JSON always emit UTF-8 regardless
> > of client_encoding?) that maybe we should address separately and not as
> > part of v19.
>
> No, you don't get to punt this till later. Once we ship v19 there's
> going to be a strong expectation of backwards compatibility.
>
> The idea of sending UTF-8 to a client that's set client_encoding to
> something else would be risible, if it weren't a security hazard.
>

I agree sending unconverted bytes to a mismatched
client encoding is clearly a security hazard that needs addressing. Did
not consider the backward compatibility part, my bad.

Was trying out adding pg_server_to_any() to the json_buf after
composite_to_json() returns,
correctly covering both explicit ENCODING option specifications and
implicit client_encoding mismatches.

Let me send a patch with code and associated test cases.

Regards,
Ayush

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2026-04-20 14:42:06 Re: Adding REPACK [concurrently]
Previous Message Tom Lane 2026-04-20 14:33:54 Re: SQL:2011 Application Time Update & Delete