pg_parse_json() should not leak token copies on failure

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: PostgreSQL Hackers <pgsql-hackers(at)postgresql(dot)org>
Subject: pg_parse_json() should not leak token copies on failure
Date: 2024-04-30 21:29:27
Message-ID: CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

When a client of our JSON parser registers semantic action callbacks,
the parser will allocate copies of the lexed tokens to pass into those
callbacks. The intent is for the callbacks to always take ownership of
the copied memory, and if they don't want it then they can pfree() it.

However, if parsing fails on the token before the callback is invoked,
that allocation is leaked. That's not a problem for the server side,
or for clients that immediately exit on parse failure, but if the JSON
parser gets added to libpq for OAuth, it will become more of a
problem.

(I'm building a fuzzer to flush out these sorts of issues; not only
does it complain loudly about the leaks, but the accumulation of
leaked memory puts a hard limit on how long a fuzzer run can last. :D)

Attached is a draft patch to illustrate what I mean, but it's
incomplete: it only solves the problem for scalar values. We also make
a copy of object field names, which is much harder to fix, because we
make only a single copy and pass that to both the object_field_start
and object_field_end callbacks. Consider:

- If a client only implements object_field_start, it takes ownership
of the field name when we call it. It can free the allocation if it
decides that the field is irrelevant.
- The same is true for clients that only implement object_field_end.
- If a client implements both callbacks, it takes ownership of the
field name when we call object_field_start. But irrelevant field names
can't be freed during that callback, because the same copy is going to
be passed to object_field_end. And object_field_end isn't guaranteed
to be called, because parsing could fail for any number of reasons
between now and then. So what code should be responsible for cleanup?
The parser doesn't know whether the callback already freed the pointer
or kept a reference to it in semstate.

Any thoughts on how we can improve this? I was thinking we could maybe
make two copies of the field name and track ownership individually?

Thanks,
--Jacob

Attachment Content-Type Size
0001-WIP-fix-leak-of-scalar-value-on-lex-failure.patch application/octet-stream 2.1 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Thomas Spear 2024-04-30 21:40:50 TLS certificate alternate trust paths issue in libpq - certificate chain validation failing
Previous Message Dmitry Koval 2024-04-30 21:14:07 Re: Add SPLIT PARTITION/MERGE PARTITIONS commands