Re: making the backend's json parser work in frontend code

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: David Steele <david(at)pgmasters(dot)net>
Cc: "pgsql-hackers(at)postgresql(dot)org" <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: making the backend's json parser work in frontend code
Date: 2020-01-17 21:33:48
Message-ID: CA+TgmoauCHfN_7+mT9rn9VSSMU55PpXEbLwMZU5cZW=5Je5J5w@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Fri, Jan 17, 2020 at 12:36 PM David Steele <david(at)pgmasters(dot)net> wrote:
> It seems to work just fine. I didn't stress it too hard but I did put
> in one escape and a multi-byte character and check the various data types.

Cool.

> I used the callbacks because that's the first method I found but it
> seems like json_lex() might be easier to use in practice.

Ugh, really? That doesn't seem like it would be nice at all.

> I think it's an issue that the entire string must be passed to the lexer
> at once. That will not be great for large manifests. However, I don't
> think it will be all that hard to implement an optional "want more"
> callback in the lexer so JSON data can be fed in from the file in chunks.

I thought so initially, but now I'm not so sure. The thing is, you
actually need all the manifest data in memory at once anyway, or so I
think. You're essentially doing a "full join" between the contents of
the manifest and the contents of the file system, so you've got to
scan one (probably the filesystem) and then mark entries in the other
(probably the manifest) used as you go.

But this might need more thought. The details probably depend on
exactly how you design it all.

> So, that just leaves ereport() as the largest remaining issue? I'll
> look at that today and Tuesday and see what I can work up.

PFA my work on that topic. As compared with my previous patch series,
the previous 0001 is dropped and what are now 0001 and 0002 are the
same as patches from the previous series. 0003 and 0004 are aiming
toward getting rid of ereport() and, I believe, show a plausible
strategy for so doing. There are, possibly, things not to like here,
and it's certainly incomplete, but I think I kinda like this
direction. Comments appreciated.

0003 nukes lex_accept(), inlining the logic into callers. I found that
the refactoring I wanted to do in 0004 was pretty hard without this,
and it turns out to save code, so I think this is a good idea
independently of anything else.

0004 adjusts many functions in jsonapi.c to return a new enumerated
type, JsonParseErrorType, instead of directly doing ereport(). It adds
a new function that takes this value and a lexing context and throws
an error. The JSON_ESCAPING_INVALID case is wrong and involves a gross
hack, but that's fixable with another field in the lexing context.
More work is needed to really bring this up to scratch, but the idea
is to make this code have a soft dependency on ereport() rather than a
hard one.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Attachment Content-Type Size
v2-0001-Adjust-src-include-utils-jsonapi.h-so-it-s-not-ba.patch application/octet-stream 6.9 KB
v2-0002-Split-JSON-lexer-parser-from-json-data-type-suppo.patch application/octet-stream 70.1 KB
v2-0004-WIP-Return-errors-rather-than-using-ereport.patch application/octet-stream 32.4 KB
v2-0003-Remove-jsonapi.c-s-lex_accept.patch application/octet-stream 5.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Alexander Korotkov 2020-01-17 21:34:47 Re: Avoid full GIN index scan when possible
Previous Message Alexander Korotkov 2020-01-17 21:33:14 Re: Avoid full GIN index scan when possible