Re: [PATCH] pg_stat_toast

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Gunnar Nick Bluth <gunnar(dot)bluth(at)pro-open(dot)de>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: [PATCH] pg_stat_toast
Date: 2022-04-06 15:49:59
Message-ID: CA+Tgmobwt6TNyi_g1OwwD20DkyQasM5ftD0qUZTkreqbUoUHNg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Apr 5, 2022 at 6:08 PM Gunnar "Nick" Bluth
<gunnar(dot)bluth(at)pro-open(dot)de> wrote:
> At the end of the day, one could argue like you did there for almost all
> (non-attribute) stats. "Why track function execution times? Just set up
> a benchmark and call the function 1 mio times and you'll know how long
> it takes on average!". "Why track IO timings? Run a benchmark on your
> system and ..." etc. pp.
>
> I maintain a couple of DBs that house TBs of TOASTable data (mainly XML
> containing encrypted payloads). In just a couple of columns per cluster.
> I'm completely clueless if TOAST compression makes a difference there.
> Or externalization.
> And I'm not allowed to copy that data anywhere outside production
> without unrolling a metric ton of red tape.
> Guess why I started writing this patch ;-)
> *I* would certainly leave the option on, just to get an idea of what's
> happening...

I feel like if you want to know whether externalization made a
difference, you can look at the size of the TOAST table. And by
selecting directly from that table, you can even see how many chunks
it contains, and how many are full-sized chunks vs. partial chunks,
and stuff like that. And for compression, how about looking at
pg_column_size(col1) vs. pg_column_size(col1||'') or something like
that? You might get a 1-byte varlena header on the former and a 4-byte
varlena header on the latter even if there's no compression, but any
gains beyond 3 bytes have to be due to compression.

> Most DBAs I met will *happily* donate a few CPU cycles (and MBs) to
> gather as much first hand information about their *live* systems.
>
> Why is pg_stat_statements so popular? Even if it costs 5-10% CPU
> cycles...? If I encounter a tipped-over plan and have a load1 of 1200 on
> my production server, running pgbadger on 80GB of (compressed) full
> statement logs will just not give me the information I need _now_
> (actually, an hour ago). So I happily deploy pg_stat_statements
> everywhere, *hoping* that I'll never really need it...
>
> [ additional arguments ]

I'm not trying to argue that instrumentation in the database is *in
general* useless. That would be kinda ridiculous, especially since
I've spent time working on it myself.

But all cases are not the same. If you don't use something like
pg_stat_statements or auto_explain or log_min_duration_statement, you
don't have any good way of knowing which of your queries are slow and
how slow they are, and you really need some kind of instrumentation to
help you figure that out. On the other hand, you CAN find out how
effective compression is, at least in terms of space, without
instrumentation, because it leaves state on disk that you can examine
whenever you like. The stuff that the patch tells you about how much
*time* was consumed is data you can't get after-the-fact, so maybe
there's enough value there to justify adding code to measure it. I'm
not entirely convinced, though, because I think that for most people
in most situations doing trial loads and timing them will give
sufficiently good information that they won't need anything else. I'm
not here to say that you must be wrong if you don't agree with me; I'm
just saying what I think based on my own experience.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andrew Dunstan 2022-04-06 15:50:11 Re: SQL/JSON: JSON_TABLE
Previous Message Alvaro Herrera 2022-04-06 15:49:34 Re: [PATCH] pg_stat_toast