Re: WIP Incremental JSON Parser

From: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-03-14 19:35:29
Message-ID: CAOYmi+kgckFsBNOBsrekDipq88G56pVndVXKcA7qMWYx9feChQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

I've been poking at the partial token logic. The json_errdetail() bug
mentioned upthread (e.g. for an invalid input `[12zz]` and small chunk
size) seems to be due to the disconnection between the "main" lex
instance and the dummy_lex that's created from it. The dummy_lex
contains all the information about the failed token, which is
discarded upon an error return:

> partial_result = json_lex(&dummy_lex);
> if (partial_result != JSON_SUCCESS)
> return partial_result;

In these situations, there's an additional logical error:
lex->token_start is pointing to a spot in the string after
lex->token_terminator, which breaks an invariant that will mess up
later pointer math. Nothing appears to be setting lex->token_start to
point into the partial token buffer until _after_ the partial token is
successfully lexed, which doesn't seem right -- in addition to the
pointer math problems, if a previous chunk was freed (or on a stale
stack frame), lex->token_start will still be pointing off into space.
Similarly, wherever we set token_terminator, we need to know that
token_start is pointing into the same buffer.

Determining the end of a token is now done in two separate places
between the partial- and full-lexer code paths, which is giving me a
little heartburn. I'm concerned that those could drift apart, and if
the two disagree on where to end a token, we could lose data into the
partial token buffer in a way that would be really hard to debug. Is
there a way to combine them?

Thanks,
--Jacob

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amonson, Paul D 2024-03-14 19:50:46 RE: Popcount optimization using AVX512
Previous Message Robert Haas 2024-03-14 19:23:48 Re: Possibility to disable `ALTER SYSTEM`