Re: WIP Incremental JSON Parser

From: Andrew Dunstan <andrew(at)dunslane(dot)net>
To: Jacob Champion <champion(dot)p(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Robert Haas <robertmhaas(at)gmail(dot)com>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-01-15 15:48:00
Message-ID: 126d90b3-140e-5b99-eefb-b670c493f2d0@dunslane.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers


On 2024-01-09 Tu 13:46, Jacob Champion wrote:
> On Tue, Dec 26, 2023 at 8:49 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
>> Quite a long time ago Robert asked me about the possibility of an
>> incremental JSON parser. I wrote one, and I've tweaked it a bit, but the
>> performance is significantly worse that that of the current Recursive
>> Descent parser.
> The prediction stack is neat. It seems like the main loop is hit so
> many thousands of times that micro-optimization would be necessary...
> I attached a sample diff to get rid of the strlen calls during
> push_prediction(), which speeds things up a bit (8-15%, depending on
> optimization level) on my machines.
>
> Maybe it's possible to condense some of those productions down, and
> reduce the loop count? E.g. does every "scalar" production need to go
> three times through the loop/stack, or can the scalar semantic action
> just peek at the next token prediction and do all the callback work at
> once?
>
>> + case JSON_SEM_SCALAR_CALL:
>> + {
>> + json_scalar_action sfunc = sem->scalar;
>> +
>> + if (sfunc != NULL)
>> + (*sfunc) (sem->semstate, scalar_val, scalar_tok);
>> + }
> Is it safe to store state (scalar_val/scalar_tok) on the stack, or
> does it disappear if the parser hits an incomplete token?
>
>> One possible use would be in parsing large manifest files for
>> incremental backup.
> I'm keeping an eye on this thread for OAuth, since the clients have to
> parse JSON as well. Those responses tend to be smaller, though, so
> you'd have to really be hurting for resources to need this.
>

I've incorporated your suggestion, and fixed the bug you identified.

The attached also adds support for incrementally parsing backup
manifests, and uses that in the three places we call the manifest parser.

cheers

andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com

Attachment Content-Type Size
v3-0001-Introduce-a-non-recursive-JSON-parser.patch text/x-patch 420.1 KB
v3-0002-Add-support-for-incrementally-parsing-backup-mani.patch text/x-patch 7.6 KB
v3-0003-Use-incremental-parsing-of-backup-manifests.patch text/x-patch 11.5 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Kirk Wolak 2024-01-15 15:49:19 Re: Oom on temp (un-analyzed table caused by JIT) V16.1
Previous Message Alvaro Herrera 2024-01-15 15:37:49 minor replication slot docs edits