Re: Detoasting optionally to make Explain-Analyze less misleading

From: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>
To: stepan rutz <stepan(dot)rutz(at)gmx(dot)de>
Cc: Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>, Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Date: 2024-03-12 12:20:10
Message-ID: CAEze2Wj=w+qDtPqM7qt-ZRXbG7vmvONa-ocfQZ1inGuE58zrRg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, 26 Feb 2024 at 21:54, stepan rutz <stepan(dot)rutz(at)gmx(dot)de> wrote:
>
> Hi Matthias, thanks for picking it up. I still believe this is valuable
> to a lot of people out there. Thanks for dealing with my proposal.
> Matthias, Tom, Tomas everyone.
>
> Two (more or less) controversial remarks from side.
>
> 1. Actually serialization should be the default for "analyze" in
> explain, as current analyze doesn't detoast and thus distorts the result
> in extreme (but common) cases easily by many order of magnitude (see my
> original video on that one). So current "explain analyze" only works for
> some queries and since detoasting is really transparent, it is quite
> something to leave detoasting out of explain analyze. This surprises
> people all the time, since explain analyze suggests it "runs" the query
> more realistically.

I'm not sure about this, but it could easily be a mid-beta decision
(if this is introduced before the feature freeze of 17, whenever that
is).

> 2. The bandwidth I computed in one of the previous versions of the patch
> was certainly cluttering up the explain output and it is misleading yes,
> but then again it performs a calculation people will now do in their
> head. The "bandwidth" here is how much data your query gets out of
> backend by means of the query and the deserialization. So of course if
> you do id-lookups you get a single row and such querries do have a lower
> data-retrieval bandwidth compared to bulk querries.

I think that's a job for post-processing the EXPLAIN output by the
user. If we don't give users the raw data, they won't be able to do
their own cross-referenced processing: "5MB/sec" doesn't tell you how
much time or data was actually spent.

> However having some
> measure of how fast data is delivered from the backend especially on
> larger joins is still a good indicator of one aspect of a query.

I'm not sure about that. Network speed is a big limiting factor that
we can't measure here, and the size on disk is often going to be
smaller than the data size when transfered across the network.

> Sorry for the remarks. Both are not really important, just restating my
> points here. I understand the objections and reasons that speak against
> both points and believe the current scope is just right.

No problem. Remarks from users (when built on solid arguments) are
always welcome, even if we may not always agree on the specifics.

------>8------

Attached is v9, which is rebased on master to handle 24eebc65's
changed signature of pq_sendcountedtext.
It now also includes autocompletion, and a second patch which adds
documentation to give users insights into this new addition to
EXPLAIN.

Kind regards,

Matthias van de Meent

Attachment Content-Type Size
v9-0002-Add-EXPLAIN-SERIALIZE-docs.patch application/octet-stream 2.7 KB
v9-0001-Explain-Add-SERIALIZE-option.patch application/octet-stream 28.9 KB

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Amit Kapila 2024-03-12 12:21:43 Re: Introduce XID age and inactive timeout based replication slot invalidation
Previous Message Alexander Korotkov 2024-03-12 12:10:59 Re: collect_corrupt_items_vacuum.patch