Re: WIP Incremental JSON Parser

From: Jacob Champion <champion(dot)p(at)gmail(dot)com>
To: Andrew Dunstan <andrew(at)dunslane(dot)net>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: WIP Incremental JSON Parser
Date: 2024-01-09 18:46:17
Message-ID: CAGu=u8is5+9T8DumiKedpwsW1ef1whh0EHBLXrzBOdADQL6wfg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Dec 26, 2023 at 8:49 AM Andrew Dunstan <andrew(at)dunslane(dot)net> wrote:
> Quite a long time ago Robert asked me about the possibility of an
> incremental JSON parser. I wrote one, and I've tweaked it a bit, but the
> performance is significantly worse that that of the current Recursive
> Descent parser.

The prediction stack is neat. It seems like the main loop is hit so
many thousands of times that micro-optimization would be necessary...
I attached a sample diff to get rid of the strlen calls during
push_prediction(), which speeds things up a bit (8-15%, depending on
optimization level) on my machines.

Maybe it's possible to condense some of those productions down, and
reduce the loop count? E.g. does every "scalar" production need to go
three times through the loop/stack, or can the scalar semantic action
just peek at the next token prediction and do all the callback work at
once?

> + case JSON_SEM_SCALAR_CALL:
> + {
> + json_scalar_action sfunc = sem->scalar;
> +
> + if (sfunc != NULL)
> + (*sfunc) (sem->semstate, scalar_val, scalar_tok);
> + }

Is it safe to store state (scalar_val/scalar_tok) on the stack, or
does it disappear if the parser hits an incomplete token?

> One possible use would be in parsing large manifest files for
> incremental backup.

I'm keeping an eye on this thread for OAuth, since the clients have to
parse JSON as well. Those responses tend to be smaller, though, so
you'd have to really be hurting for resources to need this.

--Jacob

Attachment Content-Type Size
no-strlen.diff.txt text/plain 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jacob Champion 2024-01-09 18:48:55 Re: [PoC] Federated Authn/z with OAUTHBEARER
Previous Message Robert Haas 2024-01-09 18:31:30 Re: Emit fewer vacuum records by reaping removable tuples during pruning