| From: | Andrew Dunstan <andrew(at)dunslane(dot)net> |
|---|---|
| To: | Ayush Tiwari <ayushtiwari(dot)slg01(at)gmail(dot)com> |
| Cc: | pgsql-hackers(at)postgresql(dot)org, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> |
| Subject: | Re: [PATCH] Reject ENCODING option for COPY TO FORMAT JSON |
| Date: | 2026-05-04 14:19:21 |
| Message-ID: | 297f1c95-63dd-4180-824e-3448e2e25fa3@dunslane.net |
| Views: | Whole Thread | Raw Message | Download mbox | Resend email |
| Thread: | |
| Lists: | pgsql-hackers |
On 2026-04-29 We 12:49 PM, Ayush Tiwari wrote:
> Hi,
>
> On Mon, 20 Apr 2026 at 20:31, Ayush Tiwari
> <ayushtiwari(dot)slg01(at)gmail(dot)com> wrote:
>
> Hi,
>
>
> On Mon, 20 Apr 2026 at 19:09, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us> wrote:
>
> Seems to me the correct thing here is to make it work like
> the other
> cases, ie perform pg_server_to_any(). I have exactly no
> sympathy for
> the argument about the RFC saying it must be UTF-8, not
> least because
> that's not in fact what is implemented (what if the server
> encoding
> isn't UTF-8?).
>
>
> Agreed. I initially thought rejecting the option was the safer
> route
> given the RFC, but as you pointed out, we aren't enforcing
> UTF-8 strictly on the server side anyway.
>
>
> Rejecting this option altogether doesn't improve anything, not
> functionally, not specs-compliance-wise, nor according to the
> principle of least surprise.
>
> Makes sense. Implementing the conversion properly
> keeps JSON format consistent with how the text and CSV formats
> behave.
>
>
> No, you don't get to punt this till later. Once we ship
> v19 there's
> going to be a strong expectation of backwards compatibility.
>
> The idea of sending UTF-8 to a client that's set
> client_encoding to
> something else would be risible, if it weren't a security
> hazard.
>
>
> I agree sending unconverted bytes to a mismatched
> client encoding is clearly a security hazard that needs
> addressing. Did
> not consider the backward compatibility part, my bad.
>
> Was trying out adding pg_server_to_any() to the json_buf after
> composite_to_json() returns,
> correctly covering both explicit ENCODING option
> specifications and
> implicit client_encoding mismatches.
>
> Let me send a patch with code and associated test cases.
>
> Attached patch with round trip test case. Please review and let me
> know if it's in the right direction.
>
>
> I have registered this patch set in the CommitFest for tracking:
> https://commitfest.postgresql.org/patch/6700/
>
> Please let me know if the patch looks good, and if I need to add it
> in the open items list for PG 19.
>
>
Basically good, I think. I have modified your test a bit, testing more
directly for the presence of the LATIN-1 encoded character and the
absence of the UTF-8 encoded character, by reading in the file with
pg_read_binary_file, and adding a test for implicit encoding by setting
client_encoding.
cheers
andrew
--
Andrew Dunstan
EDB:https://www.enterprisedb.com
| Attachment | Content-Type | Size |
|---|---|---|
| v4-0001-Apply-encoding-conversion-in-COPY-TO-FORMAT-JSON.patch | text/x-patch | 5.9 KB |
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tomas Vondra | 2026-05-04 14:21:55 | Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory |
| Previous Message | Andres Freund | 2026-05-04 13:56:28 | Re: PoC: VALGRIND_MAKE_MEM_NOACCESS for dynamic shared memory |