Quick Links

Re: Proposal to use JSON for Postgres Parser format

From:	Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To:	Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>
Cc:	Alexander Korotkov <aekorotkov(at)gmail(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Peter Geoghegan <pg(at)bowt(dot)ie>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>, pgsql-hackers(at)postgresql(dot)org
Subject:	Re: Proposal to use JSON for Postgres Parser format
Date:	2022-09-21 18:04:16
Message-ID:	CAEze2Wi0GJiQLTt29Jcabq5DdVk5_GT6EZzqAnheC98hWx8cuQ@mail.gmail.com
Views:	Whole Thread \| Raw Message \| Download mbox \| Resend email
Thread:
Lists:	pgsql-hackers

On Tue, 20 Sept 2022 at 17:29, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org> wrote:
>
> On 2022-Sep-20, Matthias van de Meent wrote:
>
> > Allow me to add: compressability
> >
> > In the thread surrounding [0] there were complaints about the size of
> > catalogs, and specifically the template database. Significant parts of
> > that (688kB of 8080kB a fresh PG14 database) are in pg_rewrite, which
> > consists mostly of serialized Nodes. If we're going to replace our
> > current NodeToText infrastructure, we'd better know we can effectively
> > compress this data.
>
> True. Currently, the largest ev_action values compress pretty well. I
> think if we wanted this to be more succint, we would have to invent some
> binary format -- perhaps something like Protocol Buffers: it'd be stored
> in the binary format in catalogs, but for output it would be converted
> into something easy to read (we already do this for
> pg_statistic_ext_data for example). We'd probably lose compressibility,
> but that'd be okay because the binary format would already remove most
> of the redundancy by nature.
>
> Do we want to go there?

I don't think that a binary format would be much better for
debugging/fixing than an optimization of the current textual format
when combined with compression. As I mentioned in that thread, there
is a lot of improvement possible with the existing format, and I think
any debugging of serialized nodes would greatly benefit from using a
textual format.

Then again, I also agree that this argument doesn't hold it's weight
when storage and output formats are going to be different. I trust
that any new tooling introduced as a result of this thread will be
better than what we have right now.

As for best format: I don't know. The current format is usable, and a
better format would not store any data for default values. JSON can do
that, but I could think of many formats that could do the same (Smile,
BSON, xml, etc.).

I do not think that protobuf is the best choice for storage, though,
because it has its own rules on what it considers a default value and
what it does or does not serialize: zero is considered the only
default for numbers, as is the empty string for text, etc.
I think it is allright for general use, but with e.g. `location: -1`
in just about every parse node we'd probably want to select our own
values to ignore during (de)serialization of fields.

Kind regards,

Matthias van de Meent

In response to

Re: Proposal to use JSON for Postgres Parser format at 2022-09-20 15:28:57 from Alvaro Herrera

Responses

Re: Proposal to use JSON for Postgres Parser format at 2022-10-27 14:38:46 from Michel Pelletier

Browse pgsql-hackers by date

	From	Date	Subject
Next Message	Andres Freund	2022-09-21 18:21:14	Re: [RFC] building postgres with meson - v13
Previous Message	Nathan Bossart	2022-09-21 18:01:26	Re: [PoC] Improve dead tuple storage for lazy vacuum