Re: [PATCH] pg_stat_toast

From: Andres Freund <andres(at)anarazel(dot)de>
To: Robert Haas <robertmhaas(at)gmail(dot)com>
Cc: Gunnar Nick Bluth <gunnar(dot)bluth(at)pro-open(dot)de>, Alvaro Herrera <alvherre(at)alvh(dot)no-ip(dot)org>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>, "kuroda(dot)hayato(at)fujitsu(dot)com" <kuroda(dot)hayato(at)fujitsu(dot)com>, Justin Pryzby <pryzby(at)telsasoft(dot)com>
Subject: Re: [PATCH] pg_stat_toast
Date: 2022-04-06 16:55:49
Message-ID: 20220406165549.r24egv5dwkkjhyir@alap3.anarazel.de
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Hi,

On 2022-04-06 12:24:20 -0400, Robert Haas wrote:
> On Wed, Apr 6, 2022 at 12:01 PM Gunnar "Nick" Bluth
> <gunnar(dot)bluth(at)pro-open(dot)de> wrote:
> > Fair enough. At that point, a lot of things become unexpectedly painful.
> > How many % of the installed base may that be though?
>
> I don't have statistics on that, but it's large enough that the
> expense associated with the statistics collector is a reasonably
> well-known pain point, and for some users, a really severe one.

Yea. I've seen well over 100MB/s of write IO solely due to stats files writes
on production systems, years ago.

> I'm fairly sure it's not going to make things so cheap that we can afford to
> add all the statistics anybody wants, but it's so painful that even modest
> relief would be more than welcome.

It definitely doesn't make stats free. But I'm hopefull that avoiding the
regular writing out / readin back in, and the ability to only optionally store
some stats (by varying allocation size or just having different kinds of
stats), will reduce the cost sufficiently that we can start keeping more
stats.

Which is not to say that these stats are the right ones (nor that they're the
wrong ones).

I think if I were to tackle providing more information about toasting, I'd
start not by adding a new stats view, but by adding a function to pgstattuple
that scans the relation and collects stats for each toasted column. An SRF
returning one row for each toastable column. With information like

- column name
- #inline datums
- #compressed inline datums
- sum(uncompressed inline datum size)
- sum(compressed inline datum size)
- #external datums
- #compressed external datums
- sum(uncompressed external datum size)
- sum(compressed external datum size)

IIRC this shouldn't require visiting the toast table itself.

Perhaps also an SRF that returns information about each compression method
separately (i.e. collect above information, but split by compression method)?
Perhaps even with the ability to measure how big the gains of recompressing
into another method would be?

> > > However, experience has taught me that a lot of skepticism is
> > > warranted when it comes to claims about how cheap extensions to the
> > > statistics system will be.
> >
> > Again, fair enough!
> > Maybe we first need statistics about statistics collection and handling? ;-)
>
> Heh.

I've wondered about adding pg_stat_stats the other day, actually :)
https://postgr.es/m/20220404193435.hf3vybaajlpfmbmt%40alap3.anarazel.de

Greetings,

Andres Freund

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Andres Freund 2022-04-06 16:59:30 Re: SQL/JSON: JSON_TABLE
Previous Message Andres Freund 2022-04-06 16:34:19 Re: Mingw task for Cirrus CI