| From: | Nico Williams <nico(at)cryptonector(dot)com> | 
|---|---|
| To: | Robert Haas <robertmhaas(at)gmail(dot)com> | 
| Cc: | Andrew Dunstan <andrew(at)dunslane(dot)net>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org> | 
| Subject: | Re: WIP Incremental JSON Parser | 
| Date: | 2024-01-03 23:36:45 | 
| Message-ID: | ZZXvjd9gSNlYWaRG@ubby | 
| Views: | Whole Thread | Raw Message | Download mbox | Resend email | 
| Thread: | |
| Lists: | pgsql-hackers | 
On Tue, Jan 02, 2024 at 10:14:16AM -0500, Robert Haas wrote:
> It seems like a pretty significant savings no matter what. Suppose the
> backup_manifest file is 2GB, and instead of creating a 2GB buffer, you
> create an 1MB buffer and feed the data to the parser in 1MB chunks.
> Well, that saves 2GB less 1MB, full stop. Now if we address the issue
> you raise here in some way, we can potentially save even more memory,
> which is great, but even if we don't, we still saved a bunch of memory
> that could not have been saved in any other way.
You could also build a streaming incremental parser.  That is, one that
outputs a path and a leaf value (where leaf values are scalar values,
`null`, `true`, `false`, numbers, and strings).  Then if the caller is
doing something JSONPath-like then the caller can probably immediately
free almost all allocations and even terminate the parse early.
Nico
-- 
| From | Date | Subject | |
|---|---|---|---|
| Next Message | Tom Lane | 2024-01-03 23:39:29 | Re: Add a perl function in Cluster.pm to generate WAL | 
| Previous Message | Jim Nasby | 2024-01-03 23:25:59 | Re: add function argument names to regex* functions. |