Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jacob Champion <jacob(dot)champion(at)enterprisedb(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, Peter Smith <smithpb2250(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-03-08 03:42:06
Message-ID: 682c8fff-355c-a04f-57ac-81055c4ccda8@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-03-07 Th 10:28, Jacob Champion wrote:
> Some more observations as I make my way through the patch:
>
> In src/common/jsonapi.c,
>
>> +#define JSON_NUM_NONTERMINALS 6
> Should this be 5 now?

Yep.

>
>> + res = pg_parse_json_incremental(&(incstate->lex), &(incstate->sem),
>> + chunk, size, is_last);
>> +
>> + expected = is_last ? JSON_SUCCESS : JSON_INCOMPLETE;
>> +
>> + if (res != expected)
>> + json_manifest_parse_failure(context, "parsing failed");
> This leads to error messages like
>
> pg_verifybackup: error: could not parse backup manifest: parsing failed
>
> which I would imagine is going to lead to confused support requests in
> the event that someone does manage to corrupt their manifest. Can we
> make use of json_errdetail() and print the line and column numbers?
> Patch 0001 over at [1] has one approach to making json_errdetail()
> workable in frontend code.

Looks sound on a first look. Maybe we should get that pushed ASAP so we
can take advantage of it.

>
> Top-level scalars like `false` or `12345` do not parse correctly if
> the chunk size is too small; instead json_errdetail() reports 'Token
> "" is invalid'. With small chunk sizes, json_errdetail() additionally
> segfaults on constructions like `[tru]` or `12zz`.

Ugh. Will investigate.

>
> For my local testing, I'm carrying the following diff in
> 001_test_json_parser_incremental.pl:
>
>> - ok($stdout =~ /SUCCESS/, "chunk size $size: test succeeds");
>> - ok(!$stderr, "chunk size $size: no error output");
>> + like($stdout, qr/SUCCESS/, "chunk size $size: test succeeds");
>> + is($stderr, "", "chunk size $size: no error output");
> This is particularly helpful when a test fails spuriously due to code
> coverage spray on stderr.
>

Makes sense, thanks.

I'll have a fresh patch set soon which will also take care of the bitrot.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Peter Smith 2024-03-08 03:58:07 Re: Improve eviction algorithm in ReorderBuffer
Previous Message Amit Kapila 2024-03-08 03:33:01 Re: Synchronizing slots from primary to standby