Re: Reducing output size of nodeToString

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
Subject: Re: Reducing output size of nodeToString
Date: 2024-01-09 08:23:20
Message-ID: fb1f1eae-1c1d-4ece-af41-f306436dcc0d@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 04.01.24 00:23, Matthias van de Meent wrote:
> Something like the attached? It splits out into the following
> 0001: basic 'omit default values'

/* Write an integer field (anything written as ":fldname %d") */
-#define WRITE_INT_FIELD(fldname) \
+#define WRITE_INT_FIELD_DIRECT(fldname) \
appendStringInfo(str, " :" CppAsString(fldname) " %d",
node->fldname)
+#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
+ ((node->fldname == default) ? (0) : WRITE_INT_FIELD_DIRECT(fldname))
+#define WRITE_INT_FIELD(fldname) \
+ WRITE_INT_FIELD_DEFAULT(fldname, 0)

Do we need the _DIRECT macros at all? Could we not combine that into
the _DEFAULT ones?

I think the way the conditional operator (?:) is written is not
technically correct C, because one side has an integer result (0) and
the other a void result (from appendStringInfo()). Also, this could
break accidentally even more if the result type of appendStringInfo()
was changed for some reason. I think it would be better to write this
in a more straightforward way like

#define WRITE_INT_FIELD_DEFAULT(fldname, default) \
do { \
if (node->fldname == default) \
appendStringInfo(str, " :" CppAsString(fldname) " %d",
node->fldname); \
while (0)

Relatedly, this

+/* a scaffold function to read an optionally-omitted field */
+#define READ_OPT_SCAFFOLD(fldname, read_field_code, default_value) \
+ if (pg_strtoken_next(":" CppAsString(fldname))) \
+ { \
+ read_field_code; \
+ } \
+ else \
+ local_node->fldname = default_value

would need to be written with a do { } while (0) wrapper around it.

> 0002: reset location and other querystring-related node fields for all
> catalogs of type pg_node_tree.

This goal makes sense, but I think it can be done in a better way. If
you look into the area of stringToNode(), stringToNodeWithLocations(),
and stringToNodeInternal(), there already is support for selectively
resetting or omitting location fields. Notably, this works with the
existing automated knowledge of where the location fields are and
doesn't require a new hand-maintained table. I think the way forward
here would be to apply a similar toggle to nodeToString() (the reverse).

> 0003: add default marking on typmod fields.
> 0004 & 0006: various node fields marked with default() based on
> observed common or initial values of those fields

I think we could get about half the benefit here more automatically, by
creating a separate type for typmods, like

typedef int32 TypMod;

and then having the node support automatically generate the
serialization support with a -1 default.

(A similar thing could be applied to the location fields, which would
allow us to get rid of the current hack of parsing out the name.)

Most of the other defaults I'm doubtful about. First, we are colliding
here between the goals of minimizing the storage size and making the
debug output more readable. If a Query dump would now omit the
commandType field if it is CMD_SELECT, I think that would be widely
confusing, and one would need to check the source code to identify the
reason. Also, what if we later decide to change a "default" for a
field. Then output between version would differ. Of course, node
output does change between versions in general, but these kinds of
differences would be confusing. Second, this relies on hand-maintained
annotations that were created by you presumably through a combination of
intuition and testing, based on what is in the template database. Do we
know whether this matches real-world queries created by users later?
Also, my experience dealing with the node support over the last little
while is that these manually maintained exceptions get ossified and
outdated and create a maintenance headache for the future.

> 0005: truncate trailing 0s from outDatum

Does this significantly affect anything other than the "name" type?
User views don't usually use the "name" type, so this would have limited
impact outside of system views.

> 0007 (new): do run-length + gap coding for bitmapset and the various
> integer list types. This saves a surprising amount of bytes.

Can you show examples of this? How would this affects the ability to
manually interpret the output?

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message vignesh C 2024-01-09 08:33:34 Re: Custom explain options
Previous Message Laurenz Albe 2024-01-09 08:22:02 Re: psql JSON output format