From: | Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> |
---|---|
To: | Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> |
Cc: | Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org |
Subject: | Re: Proposal to use JSON for Postgres Parser format |
Date: | 2022-09-21 18:04:16 |
Message-ID: | CAEze2Wi0GJiQLTt29Jcabq5DdVk5_GT6EZzqAnheC98hWx8cuQ@mail.gmail.com |
Views: | Raw Message | Whole Thread | Download mbox | Resend email |
Thread: | |
Lists: | pgsql-hackers |
On Tue, 20 Sept 2022 at 17:29, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Sep-20, Matthias van de Meent wrote:
>
> > Allow me to add: compressability
> >
> > In the thread surrounding [0] there were complaints about the size of
> > catalogs, and specifically the template database. Significant parts of
> > that (688kB of 8080kB a fresh PG14 database) are in pg_rewrite, which
> > consists mostly of serialized Nodes. If we're going to replace our
> > current NodeToText infrastructure, we'd better know we can effectively
> > compress this data.
>
> True. Currently, the largest ev_action values compress pretty well. I
> think if we wanted this to be more succint, we would have to invent some
> binary format -- perhaps something like Protocol Buffers: it'd be stored
> in the binary format in catalogs, but for output it would be converted
> into something easy to read (we already do this for
> pg_statistic_ext_data for example). We'd probably lose compressibility,
> but that'd be okay because the binary format would already remove most
> of the redundancy by nature.
>
> Do we want to go there?
I don't think that a binary format would be much better for
debugging/fixing than an optimization of the current textual format
when combined with compression. As I mentioned in that thread, there
is a lot of improvement possible with the existing format, and I think
any debugging of serialized nodes would greatly benefit from using a
textual format.
Then again, I also agree that this argument doesn't hold it's weight
when storage and output formats are going to be different. I trust
that any new tooling introduced as a result of this thread will be
better than what we have right now.
As for best format: I don't know. The current format is usable, and a
better format would not store any data for default values. JSON can do
that, but I could think of many formats that could do the same (Smile,
BSON, xml, etc.).
I do not think that protobuf is the best choice for storage, though,
because it has its own rules on what it considers a default value and
what it does or does not serialize: zero is considered the only
default for numbers, as is the empty string for text, etc.
I think it is allright for general use, but with e.g. `location: -1`
in just about every parse node we'd probably want to select our own
values to ignore during (de)serialization of fields.
Kind regards,
Matthias van de Meent
From | Date | Subject | |
---|---|---|---|
Next Message | Andres Freund | 2022-09-21 18:21:14 | Re: [RFC] building postgres with meson - v13 |
Previous Message | Nathan Bossart | 2022-09-21 18:01:26 | Re: [PoC] Improve dead tuple storage for lazy vacuum |