Re: making the backend's json parser work in frontend code

From: David Steele <david(at)pgmasters(dot)net>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making the backend's json parser work in frontend code
Date: 2020-01-21 22:34:55
Message-ID: 3649c0bb-a4f2-e59b-3322-ea740a179b2f@pgmasters.net
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Robert,

On 1/17/20 2:33 PM, Robert Haas wrote:
> On Fri, Jan 17, 2020 at 12:36 PM David Steele <david(at)pgmasters(dot)net>
wrote:
>
>> I used the callbacks because that's the first method I found but it
>> seems like json_lex() might be easier to use in practice.
>
> Ugh, really? That doesn't seem like it would be nice at all.

I guess it's a matter of how you want to structure the code.

>> I think it's an issue that the entire string must be passed to the lexer
>> at once. That will not be great for large manifests. However, I don't
>> think it will be all that hard to implement an optional "want more"
>> callback in the lexer so JSON data can be fed in from the file in
chunks.
>
> I thought so initially, but now I'm not so sure. The thing is, you
> actually need all the manifest data in memory at once anyway, or so I
> think. You're essentially doing a "full join" between the contents of
> the manifest and the contents of the file system, so you've got to
> scan one (probably the filesystem) and then mark entries in the other
> (probably the manifest) used as you go.

Yeah, having a copy of the manifest in memory is the easiest way to do
validation, but I think you'd want it in a structured format.

We parse the file part of the manifest into a sorted struct array which
we can then do binary searches on by filename.

>> So, that just leaves ereport() as the largest remaining issue? I'll
>> look at that today and Tuesday and see what I can work up.
>
> PFA my work on that topic. As compared with my previous patch series,
> the previous 0001 is dropped and what are now 0001 and 0002 are the
> same as patches from the previous series. 0003 and 0004 are aiming
> toward getting rid of ereport() and, I believe, show a plausible
> strategy for so doing. There are, possibly, things not to like here,
> and it's certainly incomplete, but I think I kinda like this
> direction. Comments appreciated.
>
> 0003 nukes lex_accept(), inlining the logic into callers. I found that
> the refactoring I wanted to do in 0004 was pretty hard without this,
> and it turns out to save code, so I think this is a good idea
> independently of anything else.

No arguments here.

> 0004 adjusts many functions in jsonapi.c to return a new enumerated
> type, JsonParseErrorType, instead of directly doing ereport(). It adds
> a new function that takes this value and a lexing context and throws
> an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
> hack, but that's fixable with another field in the lexing context.
> More work is needed to really bring this up to scratch, but the idea
> is to make this code have a soft dependency on ereport() rather than a
> hard one.

My first reaction was that if we migrated ereport() first it would make
this all so much easier. Now I'm no so sure.

Having a general json parser in libcommon that is not tied into a
specific error handling/logging system actually sounds like a really
nice thing to have. If we do migrate ereport() the user would always
have the choice to call throw_a_json_error() if they wanted to.

There's also a bit of de-duplication of error messages, which is nice,
especially in the case JSON_ESCAPING_INVALID. And I agree that this
case can be fixed with another field in the lexer -- or at least so it
seems to me.

Though, throw_a_json_error() is not my favorite name. Perhaps
json_ereport()?

Regards,
--
-David
david(at)pgmasters(dot)net

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alvaro Herrera 2020-01-21 22:37:26 Re: Psql patch to show access methods info
Previous Message Alvaro Herrera 2020-01-21 22:33:29 Re: Psql patch to show access methods info