Re: Detoasting optionally to make Explain-Analyze less misleading

From: stepan rutz <stepan(dot)rutz(at)gmx(dot)de>
To: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
Cc: pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Date: 2023-09-15 20:09:41
Message-ID: 03f952d6-b773-be93-e5d5-8f9ea9c2d461@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

please see a revised version yesterday's mail. The patch attached now
provides the following:

EXPLAIN(ANALYZE,SERIALIZE)

and

EXPLAIN(ANALYZE,SERIALIZEBINARY)

and timing output.

Both options perform the serialization during analyze and provide an
additional output in the plan like this:

template1=# explain (analyze,serialize) select * from t12 limit 1;
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------

 ...

 Serialized Bytes: 36 bytes
 Execution Time: 0.035 ms
(5 rows)

or also this

template1=# explain (analyze,serialize) select * from t1;
                                         QUERY PLAN
---------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..1.02 rows=2 width=19) (actual
time=0.101..0.111 rows=5 loops=1)
 Planning Time: 0.850 ms
 Serialized Bytes: 85777978 bytes
 Execution Time: 354.284 ms
(4 rows)

Its tempting to divide Serialized-Bytes by Execution-Time to get an idea
of the serialization bandwidth. This is /dev/null serialization though.
The results are length-counted and then discarded.

Since detoasting happens implicitly during serialization, the number of
bytes becomes huge in this case and accounts for the detoasted lengths
as well. I tried to get the number of bytes send for the protocol's
messages and the attribute headers correctly. For the actual values I am
quite sure I get the correct measures, as one can really tell by sending
more values across. Null is 4 bytes on the wire interestingly. I didn't
know that, but it makes sense, since its using the same prefix
length-field as all values do.

I have checked the JBDC driver and it uses binary and text formats
depending on an attribute's type oid. So having the SERIALIZEBINARY
option is not accurate, as in reality both formats can be occur for the
same tuple.

Please provide some feedback on the new patch and let me know if this
makes sense. In general this kind of option for EXPLAIN is a good thing
for sure.

Greetings,

Stepan

On 14.09.23 21:27, stepan rutz wrote:
> Hi Tom, Hi Matthias,
>
> you are right of course. I have looked at the code from printtup.c and
> made a new version of the patch.
>
> Thanks for the MemoryContextReset hint too (@Matthias)
>
> This time is called  EXPLAIN(ANALYZE,SERIALIZE) (hey, it also sounds
> nicer phonetically)
>
> If the option SERIALIZE is active, the output functions are called and
> they perform the detoasting, which I have even checked.
>
> So things are better this way, however I hardcoded the output option
> "Text" (format=0). In printtup.c there is an incoming array which
> applies Text (format=0) or Binary (format=1) for each column
> individually. I am not sure whether this is even needed. I left in the
> if-statement from printtup.c which calls the binary output method of a
> given type. The result of the output is ignored and apparently free'd
> because of the memory-context-reset at the end.
>
> Please also note, that I added a call to DestReceiver's rDestroy hook,
> which was missing from explain.c before altogether.
>
> Feedback is appreciated.
>
> /Stepan
>
>
> On 12.09.23 17:26, Tom Lane wrote:
>> Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com> writes:
>>> Hmm, maybe we should measure the overhead of serializing the tuples
>>> instead.
>>> The difference between your patch and "serializing the tuples, but not
>>> sending them" is that serializing also does the detoasting, but also
>>> includes any time spent in the serialization functions of the type. So
>>> an option "SERIALIZE" which measures all the time the server spent on
>>> the query (except the final step of sending the bytes to the client)
>>> would likely be more useful than "just" detoasting.
>> +1, that was my immediate reaction to the proposal as well. Some
>> output functions are far from cheap.  Doing only the detoast part
>> seems like it's still misleading.
>>
>> Do we need to go as far as offering both text-output and binary-output
>> options?
>>
>>             regards, tom lane

Attachment Content-Type Size
0004_explain_analyze_and_serialize.patch text/x-patch 16.6 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message David E. Wheeler 2023-09-15 20:13:22 Re: JSON Path and GIN Questions
Previous Message Yurii Rashkovskii 2023-09-15 18:46:35 ALTER ROLE documentation improvement