Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-02-22 09:38:32
Message-ID: 3efe8333-3285-1d9d-5ad3-0f9784d5c1c4@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-02-21 We 15:26, Jacob Champion wrote:
> On Wed, Feb 21, 2024 at 6:50 AM Jacob Champion
> <jacob(dot)champion(at)enterprisedb(dot)com> wrote:
>> On Tue, Feb 20, 2024 at 9:32 PM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>>> *sigh* That's weird. I wonder why you can reproduce it and I can't. Can
>>> you give me details of the build? OS, compiler, path to source, build
>>> setup etc.? Anything that might be remotely relevant.
> This construction seems suspect, in json_lex_number():
>
>> if (lex->incremental && !lex->inc_state->is_last_chunk &&
>> len >= lex->input_length)
>> {
>> appendStringInfoString(&lex->inc_state->partial_token,
>> lex->token_start);
>> return JSON_INCOMPLETE;
>> }
> appendStringInfoString() isn't respecting the end of the chunk: if
> there's extra data after the chunk boundary (as
> AppendIncrementalManifestData() does) then all of that will be stuck
> onto the end of the partial_token.
>
> I'm about to context-switch off of this for the day, but I can work on
> a patch tomorrow if that'd be helpful. It looks like this is not the
> only call to appendStringInfoString().
>

Yeah, the issue seems to be with chunks of json that are not
null-terminated. We don't require that they be so this code was buggy.
It wasn't picked up earlier because the tests that I wrote did put a
null byte at the end. Patch 5 in this series fixes those issues and
adjusts most of the tests to add some trailing junk to the pieces of
json, so we can be sure that this is done right.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
v8-0001-Introduce-a-non-recursive-JSON-parser.patch text/x-patch 421.1 KB
v8-0002-Add-support-for-incrementally-parsing-backup-mani.patch text/x-patch 7.6 KB
v8-0003-Use-incremental-parsing-of-backup-manifests.patch text/x-patch 11.5 KB
v8-0004-add-logging-traces-to-see-if-we-can-find-out-what.patch text/x-patch 3.3 KB
v8-0005-fixes-for-non-null-terminated-inputs-for-incremen.patch text/x-patch 6.1 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Sutou Kouhei 2024-02-22 09:39:48 Re: Make COPY format extendable: Extract COPY TO format implementations
Previous Message Daniel Gustafsson 2024-02-22 09:33:04 Re: Test to dump and restore objects left behind by regression