Quick Links

Re: WIP Incremental JSON Parser

From:	Andrew Dunstan <andrew(at)dunslane(dot)net>
To:	Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc:	Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject:	Re: WIP Incremental JSON Parser
Date:	2024-03-18 10:32:32
Message-ID:	CAD5tBcLjFnTnAvUpQg-K5sJa3ECnEj8CQZ-AYrH1k+F=42D+Vg@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Thu, Mar 14, 2024 at 3:35 PM Jacob Champion <
jacob(dot)champion(at)enterprisedb(dot)com> wrote:

> I've been poking at the partial token logic. The json_errdetail() bug
> mentioned upthread (e.g. for an invalid input `[12zz]` and small chunk
> size) seems to be due to the disconnection between the "main" lex
> instance and the dummy_lex that's created from it. The dummy_lex
> contains all the information about the failed token, which is
> discarded upon an error return:
>
> > partial_result = json_lex(&dummy_lex);
> > if (partial_result != JSON_SUCCESS)
> > return partial_result;
>
> In these situations, there's an additional logical error:
> lex->token_start is pointing to a spot in the string after
> lex->token_terminator, which breaks an invariant that will mess up
> later pointer math. Nothing appears to be setting lex->token_start to
> point into the partial token buffer until _after_ the partial token is
> successfully lexed, which doesn't seem right -- in addition to the
> pointer math problems, if a previous chunk was freed (or on a stale
> stack frame), lex->token_start will still be pointing off into space.
> Similarly, wherever we set token_terminator, we need to know that
> token_start is pointing into the same buffer.
>
> Determining the end of a token is now done in two separate places
> between the partial- and full-lexer code paths, which is giving me a
> little heartburn. I'm concerned that those could drift apart, and if
> the two disagree on where to end a token, we could lose data into the
> partial token buffer in a way that would be really hard to debug. Is
> there a way to combine them?
>

Not very easily. But I think and hope I've fixed the issue you've
identified above about returning before lex->token_start is properly set.

Attached is a new set of patches that does that and is updated for the
json_errdetaiil() changes.

cheers

andrew

Attachment	Content-Type	Size
v11-0001-Introduce-a-non-recursive-JSON-parser.patch	text/x-patch	424.9 KB
v11-0002-Add-support-for-incrementally-parsing-backup-man.patch	text/x-patch	7.6 KB
v11-0003-Use-incremental-parsing-of-backup-manifests.patch	text/x-patch	11.9 KB

In response to

Re: WIP Incremental JSON Parser at 2024-03-14 19:35:29 from Jacob Champion

Responses

Re: WIP Incremental JSON Parser at 2024-03-18 19:34:51 from Jacob Champion

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Dean Rasheed	2024-03-18 10:48:43	Re: Adding OLD/NEW support to RETURNING
Previous Message	Bertrand Drouvot	2024-03-18 10:12:15	Re: Introduce XID age and inactive timeout based replication slot invalidation