Re: [PATCH] pg_stat_toast

From: "Gunnar \"Nick\" Bluth" <gunnar(dot)bluth(at)pro-open(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Andres Freund <andres(at)anarazel(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: [PATCH] pg_stat_toast
Date: 2022-04-05 22:08:13
Views: Raw Message | Whole Thread | Download mbox | Resend email
Lists: pgsql-hackers

Am 05.04.22 um 18:17 schrieb Robert Haas:
> On Thu, Mar 31, 2022 at 9:16 AM Gunnar "Nick" Bluth
> <gunnar(dot)bluth(at)pro-open(dot)de> wrote:
>> That was meant to say "v10", sorry!
> Hi,

Hi Robert,

and thx for looking at this.

> From my point of view, at least, it would be preferable if you'd stop
> changing the subject line every time you post a new version.

Terribly sorry, I believed to do the right thing! I removed the "suffix"
now for good.

> Based on the test results in
> and the comments from Andres in
> my judgement would be that, as things stand today, this patch has no
> chance of being accepted, due to overhead. Now, Andres is currently
> working on an overhaul of the statistics collector and perhaps that
> would reduce the overhead of something like this to an acceptable
> level. If it does, that would be great news; I just don't know whether
> that's the case.

AFAICT, Andres' work is more about the structure (e.g.
13619598f1080d7923454634a2570ca1bc0f2fec). Or I've missed something...?

The attached v11 incorporates the latest changes in the area, btw.

Anyway, my (undisputed up to now!) understanding still is that only
backends _looking_ at these stats (so, e.g., accessing the pg_stat_toast
view) actually read the data. So, the 10-15% more space used for pg_stat
only affect the stats collector and _some few_ backends.

And those 10-15% were gathered with 10.000 tables containing *only*
TOASTable attributes. So the actual percentage would probably go down
quite a bit once you add some INTs or such.
Back then, I was curious myself on the impact and just ran a few
syntetic tests quickly hacked together. I'll happily go ahead and run
some tests on real world schemas if that helps clarifying matters!

> As far as the statistics themselves are concerned, I am somewhat
> skeptical about whether it's really worth adding code for this.
> According to the documentation, the purpose of the patch is to allow
> you to assess choice of storage and compression method settings for a
> column and is not intended to be enabled permanently. However, it

TBTH, the wording there is probably a bit over-cautious. I very much
respect Andres and thus his reservations, and I know how careful the
project is about regressions of any kind (see below on some elobarations
on the latter).
I alleviated the <note> part a bit for v11.

> seems to me that you could assess that pretty easily without this
> patch: just create a couple of different tables with different
> settings, load up the same data via COPY into each one, and see what
> happens. Now you might answer that with the patch you would get more
> detailed and accurate statistics, and I think that's true, but it
> doesn't really look like the additional level of detail would be
> critical to have in order to make a proper assessment. You might also
> say that creating multiple copies of the table and loading the data
> multiple times would be expensive, and that's also true, but you don't
> really need to load it all. A representative sample of 1GB or so would
> probably suffice in most cases, and that doesn't seem likely to be a
> huge load on the system.

At the end of the day, one could argue like you did there for almost all
(non-attribute) stats. "Why track function execution times? Just set up
a benchmark and call the function 1 mio times and you'll know how long
it takes on average!". "Why track IO timings? Run a benchmark on your
system and ..." etc. pp.

I maintain a couple of DBs that house TBs of TOASTable data (mainly XML
containing encrypted payloads). In just a couple of columns per cluster.
I'm completely clueless if TOAST compression makes a difference there.
Or externalization.
And I'm not allowed to copy that data anywhere outside production
without unrolling a metric ton of red tape.
Guess why I started writing this patch ;-)
*I* would certainly leave the option on, just to get an idea of what's

> Also, as we add more compression options, it's going to be hard to
> assess this sort of thing without trying stuff anyway. For example if
> you can set the lz4 compression level, you're not going to know which
> level is actually going to work best without trying out a bunch of
> them and seeing what happens. If we allow access to other sorts of
> compression parameters like zstd's "long" option, similarly, if you
> really care, you're going to have to try it.

Funny that you mention it. When writing the first version, I was
thinking about the LZ4 patch authors and was wondering how they
tested/benchmarked all of it and why they didn't implement something
like this patch for their tests ;-)

Yes, you're gonna try it. And you're gonna measure it. Somehow.
Externally, as things are now.

With pg_stat_toast, you'd get the byte-by-byte and - maybe even more
important - ms-by-ms comparison of the different compression and
externalization strategies straight from the core of the DB.
I'd fancy that!

And if you get these stats by just flicking a switch (or leaving it on
permanently...), you might start looking at the pg_stat_toast view from
time to time, maybe realizing that your DB server spent hours of CPU
time trying to compress data that's compressed already. Or of which you
_know_ that it's only gonna be around for a couple of seconds...

Mind you, a *lot* of people out there aren't even aware that TOAST even
exists. Granted, most probably just don't care... ;-)

Plus: this would (potentially, one day) give us information we could
eventually incorporate into EXPLAIN [ANALYZE]. Like, "estimated time for
(un)compressing TOAST values" or so.

> So my feeling is that this feels like a lot of machinery and a lot of
> worst-case overhead to solve a problem that's really pretty easy to
> solve without any new code at all, and therefore I'd be inclined to
> reject it. However, it's a well-known fact that sometimes my feelings
> about things are pretty stupid, and this might be one of those times.
> If so, I hope someone will enlighten me by telling me what I'm
> missing.

Most DBAs I met will *happily* donate a few CPU cycles (and MBs) to
gather as much first hand information about their *live* systems.

Why is pg_stat_statements so popular? Even if it costs 5-10% CPU
cycles...? If I encounter a tipped-over plan and have a load1 of 1200 on
my production server, running pgbadger on 80GB of (compressed) full
statement logs will just not give me the information I need _now_
(actually, an hour ago). So I happily deploy pg_stat_statements
everywhere, *hoping* that I'll never really need it...

Why is "replica" now the default WAL level? Because essentially
everybody changed it anyway, _just in case_. People looking for the last
couple of % disk space will tune it down to "minimal", for everybody
else, the gain in *options* vastly outweighs the additional disk usage.

Why is everybody asking for live execution plans? Or a progress
indication? The effort to get these is ridiculous from what I know,
still I'd fancy them a lot!

One of my clients is currently spending a lot of time (and thus $$$) to
get some profiling software (forgot the name) for their DB2 to work (and
not push AIX into OOM situations, actually ;). And compared to
PostgreSQL, I'm pretty sure you get a lot more insights from a stock DB2
already. As that's what customers ask for...

In essence: if *I* read in the docs "this will give you useful
information" (and saves you effort for testing it in a seperate
environment) "but may use up some RAM and disk space for pg_stats", I
flick that switch on and probably leave it there.

And in real world applications, you'd almost certainly never note a
difference (we're discussing ~ 50-60 bytes per attribute, afterall).

I reckon most DBAs (and developers) would give this a spin and leave it
on, out of curiosity first and out of sheer convenience later.
Like, if I run a DB relying heavily on stored procedures, I'll certainly
enable "track_functions".
Now show me the DB without any TOASTable attributes! ;-)

TBTH, I imagine this to be a default "on" GUC parameter *eventually*,
which some people with *very* special needs (and braindead schemas
causing the "worst-case overhead" you mention) turn "off". But alas!
that's not how we add features, is it?

Also, I wouldn't call ~ 583 LOC plus docs & tests "a lot of machinery" ;-).

Again, thanks a lot for peeking at this and
best regards,
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar(dot)bluth(at)pro-open(dot)de
"Ceterum censeo SystemD esse delendam" - Cato

Attachment Content-Type Size
pg_stat_toast_v11.patch text/x-patch 45.4 KB

In response to


Browse pgsql-hackers by date

  From Date Subject
Next Message Joe Wildish 2022-04-05 22:17:30 MERGE bug report
Previous Message Robert Haas 2022-04-05 21:52:59 Re: should vacuum's first heap pass be read-only?