Re: Reducing output size of nodeToString

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: Peter Eisentraut <peter(at)eisentraut(dot)org>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
Subject: Re: Reducing output size of nodeToString
Date: 2024-01-31 16:17:03
Message-ID: CAEze2Wgd1Z+7Z2bb8Q4Nnk1ki55aH0acWxAyO7TfesMozVs5JQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, 31 Jan 2024, 09:16 Peter Eisentraut, <peter(at)eisentraut(dot)org> wrote:

> On 30.01.24 12:26, Matthias van de Meent wrote:
> >> Most of the other defaults I'm doubtful about. First, we are colliding
> >> here between the goals of minimizing the storage size and making the
> >> debug output more readable.
> > I've never really wanted to make the output "more readable". The
> > current one is too verbose, yes.
>
> My motivations at the moment to work in this area are (1) to make the
> output more readable, and (2) to reduce maintenance burden of node
> support functions.
>
> There can clearly be some overlap with your goals. For example, a less
> verbose and less redundant output can ease readability. But it can also
> go the opposite direction; a very minimalized output can be less readable.
>
> I would like to understand your target more. You have shown some
> figures how these various changes reduce storage size in pg_rewrite.
> But it's a few hundred kilobytes, if I read this correctly, maybe some
> megabytes if you add a lot of user views. Does this translate into any
> other tangible benefits, like you can store more views, or processing
> views is faster, or something like that?

I was also thinking about smaller per-attribute expression storage, for
index attribute expressions, table default expressions, and functions.
Other than that, less memory overhead for the serialized form of these
constructs also helps for catalog cache sizes, etc.
People complained about the size of a fresh initdb, and I agreed with them,
so I started looking at low-hanging fruits, and this is one.

I've not done any tests yet on whether it's more performant in general. I'd
expect the new code to do a bit better given the extremely verbose nature
of the data and the rather complex byte-at-a-time token read method used,
but this is currently hypothesis.
I do think that serialization itself may be slightly slower, but given that
this generally happens only in DDL, and that we have to grow the output
buffer less often, this too may still be a net win (but, again, this is an
untested hypothesis).

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tom Lane 2024-01-31 16:51:00 pgsql: Clean pg_walsummary's tmp_check directory.
Previous Message Robert Haas 2024-01-31 15:56:13 Re: Possibility to disable `ALTER SYSTEM`