Re: Reducing output size of nodeToString

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Peter Eisentraut <peter(at)eisentraut(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
Subject: Re: Reducing output size of nodeToString
Date: 2024-01-31 17:47:39
Message-ID: CA+TgmoYkeSEFCF6Q4yKL85pmMT9u7F1SiK52LeGKC7smQg6gcQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Wed, Jan 31, 2024 at 11:17 AM Matthias van de Meent
<boekewurm+postgres(at)gmail(dot)com> wrote:
> I was also thinking about smaller per-attribute expression storage, for index attribute expressions, table default expressions, and functions. Other than that, less memory overhead for the serialized form of these constructs also helps for catalog cache sizes, etc.
> People complained about the size of a fresh initdb, and I agreed with them, so I started looking at low-hanging fruits, and this is one.
>
> I've not done any tests yet on whether it's more performant in general. I'd expect the new code to do a bit better given the extremely verbose nature of the data and the rather complex byte-at-a-time token read method used, but this is currently hypothesis.
> I do think that serialization itself may be slightly slower, but given that this generally happens only in DDL, and that we have to grow the output buffer less often, this too may still be a net win (but, again, this is an untested hypothesis).

I think we're going to have to have separate formats for debugging and
storage if we want to get very far here. The current format sucks for
readability because it's so verbose, and tightening that up where we
can makes sense to me. For me, that can include things like emitting
unset location fields for sure, but delta-encoding of bitmap sets is
more questionable. Turning 1 2 3 4 5 6 7 8 9 10 into 1-10 would be
fine with me because that is both shorter and more readable, but
turning 2 4 6 8 10 into 2 2 2 2 2 is way worse for a human reader.
Such optimizations might make sense in a format that is designed for
computer processing only but not one that has to serve multiple
purposes.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Jelte Fennema-Nio 2024-01-31 17:49:40 Re: Flushing large data immediately in pqcomm
Previous Message Fabrízio de Royes Mello 2024-01-31 17:39:28 Re: speed up a logical replica setup