[PATCH] pg_stat_toast

From: "Gunnar \"Nick\" Bluth" <gunnar(dot)bluth(at)pro-open(dot)de>
To: pgsql-hackers(at)lists(dot)postgresql(dot)org
Cc: gkokolatos(at)protonmail(dot)com
Subject: [PATCH] pg_stat_toast
Date: 2021-12-12 16:20:58
Message-ID: a08b54fa-7b13-9531-6233-33a3d23773a8@pro-open.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hello -hackers!

Please have a look at the attached patch, which implements some
statistics for TOAST.

The idea (and patch) have been lurking here for quite a while now, so I
decided to dust it off, rebase it to HEAD and send it out for review today.

A big shoutout to Georgios Kokolatos, who gave me a crash course in PG
hacking, some very useful hints and valueable feedback early this year.

I'd like to get some feedback about the general idea, approach, naming
etc. before refining this further.

I'm not a C person and I s**k at git, so please be kind with me! ;-)
Also, I'm not subscribed here, so a CC would be much appreciated!

Why gather TOAST statistics?
============================
TOAST is transparent and opaque at the same time.
Whilst we know that it's there and we know _that_ it works, we cannot
generally tell _how well_ it works.

What we can't answer (easily) are questions like e.g.
- how many datums have been externalized?
- how many datums have been compressed?
- how often has a compression failed (resulted in no space saving)?
- how effective is the compression algorithm used on a column?
- how much time did the DB spend compressing/decompressing TOAST values?

The patch adds some functionality that will eventually be able to answer
these (and probably more) questions.

Currently, #1 - #4 can be answered based on the view contained in
"pg_stats_toast.sql":

postgres=# CREATE TABLE test (i int, lz4 text COMPRESSION lz4, std text);
postgres=# INSERT INTO test SELECT
i,repeat(md5(i::text),100),repeat(md5(i::text),100) FROM
generate_series(0,100000) x(i);
postgres=# SELECT * FROM pg_stat_toast WHERE schemaname = 'public';
-[ RECORD 1 ]--------+----------
schemaname | public
reloid | 16829
attnum | 2
relname | test
attname | lz4
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 6299710
originalsizesum | 320403204
-[ RECORD 2 ]--------+----------
schemaname | public
reloid | 16829
attnum | 3
relname | test
attname | std
externalizations | 0
compressions | 100001
compressionsuccesses | 100001
compressionsizesum | 8198819
originalsizesum | 320403204

Implementation
==============
I added some callbacks in backend/access/table/toast_helper.c to
"pgstat_report_toast_activity" in backend/postmaster/pgstat.c.

The latter (and the other additions there) are essentially 1:1 copies of
the function statistics.

Those were the perfect template, as IMHO the TOAST activities (well,
what we're interested in at least) are very much comparable to function
calls:
a) It doesn't really matter if the TOASTed data was committed, as "the
damage is done" (i.e. CPU cycles were used) anyway
b) The information can (thus/best) be stored on DB level, no need to
touch the relation or attribute statistics

I didn't find anything that could have been used as a hash key, so the
PgStat_StatToastEntry
uses the shiny new
PgStat_BackendAttrIdentifier
(containing relid Oid, attr int).

For persisting in the statsfile, I chose the identifier 'O' (as 'T' was
taken).

What's working?
===============
- Gathering of TOAST externalization and compression events
- collecting the sizes before and after compression
- persisting in statsfile
- not breaking "make check"
- not crashing anything (afaict)

What's missing (yet)?
===============
- proper definition of the "pgstat_track_toast" GUC
- Gathering of times (for compression [and decompression?])
- improve "pg_stat_toast" view and include it in the catalog
- documentation (obviously)
- proper naming (of e.g. the hash key type, functions, view columns etc.)
- would it be necessary to implement overflow protection for the size &
time sums?

Thanks in advance & best regards,
--
Gunnar "Nick" Bluth

Eimermacherweg 106
D-48159 Münster

Mobil +49 172 8853339
Email: gunnar(dot)bluth(at)pro-open(dot)de
__________________________________________________________________________
"Ceterum censeo SystemD esse delendam" - Cato

Attachment Content-Type Size
0001-initial-patch-of-pg_stat_toast-for-hackers.patch text/x-patch 29.2 KB

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Tomas Vondra 2021-12-12 17:45:43 Re: extended stats on partitioned tables
Previous Message Zhihong Yu 2021-12-12 15:37:01 Re: extended stats on partitioned tables