Re: Detoasting optionally to make Explain-Analyze less misleading

From: stepan rutz <stepan(dot)rutz(at)gmx(dot)de>
To: Matthias van de Meent <boekewurm+postgres(at)gmail(dot)com>, Tomas Vondra <tomas(dot)vondra(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, pgsql-hackers(at)postgresql(dot)org
Subject: Re: Detoasting optionally to make Explain-Analyze less misleading
Date: 2024-02-26 20:54:11
Message-ID: ea885631-21f1-425a-97ed-c4bfb8cf9c63@gmx.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi Matthias, thanks for picking it up. I still believe this is valuable
to a lot of people out there. Thanks for dealing with my proposal.
Matthias, Tom, Tomas everyone.

Two (more or less) controversial remarks from side.

1. Actually serialization should be the default for "analyze" in
explain, as current analyze doesn't detoast and thus distorts the result
in extreme (but common) cases easily by many order of magnitude (see my
original video on that one). So current "explain analyze" only works for
some queries and since detoasting is really transparent, it is quite
something to leave detoasting out of explain analyze. This surprises
people all the time, since explain analyze suggests it "runs" the query
more realistically.

2. The bandwidth I computed in one of the previous versions of the patch
was certainly cluttering up the explain output and it is misleading yes,
but then again it performs a calculation people will now do in their
head. The "bandwidth" here is how much data your query gets out of
backend by means of the query and the deserialization. So of course if
you do id-lookups you get a single row and such querries do have a lower
data-retrieval bandwidth compared to bulk querries. However having some
measure of how fast data is delivered from the backend especially on
larger joins is still a good indicator of one aspect of a query.

Sorry for the remarks. Both are not really important, just restating my
points here. I understand the objections and reasons that speak against
both points and believe the current scope is just right.

/Stepan

On 26.02.24 20:30, Matthias van de Meent wrote:
> Hi,
>
> I've taken the liberty to update this patch, and register it in the
> commitfest app to not lose track of progress [0].
>
> The attached v8 patch measures scratch memory allocations (with MEMORY
> option), total time spent in serialization (with TIMING on, measures
> are inclusive of unseparated memcpy to the message buffer), and a
> count of produced bytes plus the output format used (text or binary).
> It's a light rework of the earlier 0007 patch, I've reused tests and
> some infrastructure, while the implementation details and comments
> have been updated significantly.
>
> I think we can bikeshed on format and names, but overall I think the
> patch is in a very decent shape.
>
> Stepan, thank you for your earlier work, and feel free to check it out
> or pick it up again if you want to; else I'll try to get this done.
>
> Kind regards,
>
> Matthias van de Meent
>
> [0] https://commitfest.postgresql.org/47/4852/

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Melanie Plageman 2024-02-26 20:56:57 Re: Streaming read-ready sequential scan code
Previous Message Andrew Atkinson 2024-02-26 20:40:05 Re: An improved README experience for PostgreSQL