[PATCH] json_lex_string: don't overread on bad UTF8

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Cc: Andrew Dunstan <andrew(at)dunslane(dot)net>
Subject: [PATCH] json_lex_string: don't overread on bad UTF8
Date: 2024-04-30 17:39:04
Message-ID: CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi all,

When json_lex_string() hits certain types of invalid input, it calls
pg_encoding_mblen_bounded(), which assumes that its input is
null-terminated and calls strnlen(). But the JSON lexer is constructed
with an explicit string length, and we don't ensure that the string is
null-terminated in all cases, so we can walk off the end of the
buffer. This isn't really relevant on the server side, where you'd
have to get a superuser to help you break string encodings, but for
client-side usage on untrusted input (such as my OAuth patch) it would
be more important.

Attached is a draft patch that explicitly checks against the
end-of-string pointer and clamps the token_terminator to it. Note that
this removes the only caller of pg_encoding_mblen_bounded() and I'm
not sure what we should do with that function. It seems like a
reasonable API, just not here.

The new test needs to record two versions of the error message, one
for invalid token and one for invalid escape sequence. This is
because, for smaller chunk sizes, the partial-token logic in the
incremental JSON parser skips the affected code entirely when it can't
find an ending double-quote.

Tangentially: Should we maybe rethink pieces of the json_lex_string
error handling? For example, do we really want to echo an incomplete
multibyte sequence once we know it's bad? It also looks like there are
places where the FAIL_AT_CHAR_END macro is called after the `s`
pointer has already advanced past the code point of interest. I'm not
sure if that's intentional.

Thanks,
--Jacob

Attachment Content-Type Size
0001-json_lex_string-don-t-overread-on-bad-UTF8.patch application/octet-stream 1.9 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2024-04-30 17:52:02 Re: pg17 issues with not-null contraints
Previous Message Robert Haas 2024-04-30 17:31:09 Re: A problem about partitionwise join