Re: Reducing output size of nodeToString

From: Peter Eisentraut <peter(at)eisentraut(dot)org>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: Robert Haas <robertmhaas(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, Andres Freund <andres(at)anarazel(dot)de>, Michel Pelletier <pelletier(dot)michel(at)gmail(dot)com>
Subject: Re: Reducing output size of nodeToString
Date: 2024-03-22 09:18:33
Message-ID: 9f8b880d-e040-4c37-9fdf-c7b55ac11892@eisentraut.org
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On 20.03.24 13:03, Matthias van de Meent wrote:
> On Wed, 20 Mar 2024 at 12:49, Peter Eisentraut <peter(at)eisentraut(dot)org> wrote:
>>
>> On 19.03.24 17:13, Peter Eisentraut wrote:
>>> On 11.03.24 21:52, Matthias van de Meent wrote:
>>>>> * v7-0003-gen_node_support.pl-Mark-location-fields-as-type-.patch
>>>>>
>>>>> This looks sensible, but maybe making Location a global type is a bit
>>>>> much? Maybe something more specific like ParseLocation, or ParseLoc, to
>>>>> keep it under 12 characters.
>>>> I've gone with ParseLoc in the attached v8 patchset.
>>>
>>> I have committed this one.
>>
>> Next, I was looking at
>> v8-0003-pg_node_tree-Don-t-store-query-text-locations-in-.patch.
>
> [...]
>
>> So anyway, my idea was that we should turn this around and make
>> nodeToString() always drop location information, and instead add
>> nodeToStringWithLocations() for the few debugging uses. And this would
>> also be nice because then it matches exactly with the existing
>> stringToNodeWithLocations().
>
> That seems reasonable, yes.

I have committed that one.

This takes care of your patches v8-0002 and v8-0003.

About the rest of your patch set:

As long as we have only one output format, we need to balance several
uses, including debugging, storage size, (de)serialization performance.

Your patches v8-0005 and up are clearly positive for storage size but
negative for debugging. So I think we can't consider them now.

Your patches v8-0001 ("pg_node_tree: Omit serialization of fields with
default values.") and v8-0004 ("gen_node_support.pl: Add a TypMod type
for signalling TypMod behaviour") are also good for storage size. I
don't know how they affect serialization performance. I also don't know
how good they are for debugging. I have argued here and there that
omitting unset fields can make node dumps more readable. But that's
just me. I have looked at a lot of Query and RangeTblEntry nodes
lately, which contain many rarely used fields. But other people might
have completely different experiences, with other node and tree types.
We didn't get much feedback from anyone else in this thread, so I'm very
hesitant to impose this on everyone without any consensus.

I could see "Omit serialization of fields with default values" as a
separate toggle for debug node dumps.

Also, there is clearly some lingering interesting in a separate
binary-ish serialization format for internal use. This should probably
also take a look at (de)serialization performance, which we haven't
really done in this thread. In a way, with the omit default values
patch, the serialization and deserialization does more work, so it could
have an adverse impact. But we don't know.

I think to proceed we need more buy-in on the "what do I want from my
node dumps" side, and more performance numbers on the other side.
Saving a few hundred kilobytes on view storage is fine but isn't by
itself that useful, I think, if it potentially negatively affects other
uses.

In response to

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Langote 2024-03-22 09:23:42 Re: Memory consumed by child SpecialJoinInfo in partitionwise join planning
Previous Message Michał Kłeczek 2024-03-22 09:11:51 Re: DRAFT: Pass sk_attno to consistent function