Re: Optimize external TOAST storage

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Nathan Bossart <nathandbossart(at)gmail(dot)com>
Cc: davinder singh <davindersingh2692(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Optimize external TOAST storage
Date: 2022-03-16 14:08:45
Message-ID: CA+TgmoZmDc0+GLcO6eK_REfuCaFrEK14=nnwuQn=5a=JzSYXOg@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Tue, Mar 15, 2022 at 5:48 PM Nathan Bossart <nathandbossart(at)gmail(dot)com> wrote:
> I apologize for thinking out loud a bit here, but I hope this gives you
> some insight into my perspective on this. In general, I am skeptical that
> we can choose one threshold that will work for all PostgreSQL installations
> in the known universe.

I would like to take a slightly contrary position. I think that a
system here that involves multiple knobs is going to be too
complicated to be of any real-world use, because almost nobody will
understand it or tune it properly for their installation. And who is
to say that a single setting would even be appropriate for a whole
installation, as opposed to something that needs to be tuned for each
individual table? A single tunable might be OK, but what I would
really like here is a heuristic that, while perhaps not optimal in
every environment, is good enough that a very high percentage of users
will never need to worry about it. In a case like this, a small gain
that happens automatically feels better than a large gain that
requires manual tweaking in every install.

Now that doesn't answer the question of what that heuristic should be.
I initially thought that if compression didn't end up reducing the
number of TOAST chunks then there was no point, but that's not true,
because having the last chunk be smaller than the rest can allow us to
fit more than 4 into a page rather than exactly 4. However, this
effect isn't likely to be important if most chunks are full-size
chunks. If we insert 100 full-size chunks and 1 partial chunk at the
end, we can't save much even if that chunk ends up being 1 byte
instead of a full-size chunk. 25 TOAST table pages out of 26 are going
to be full of full-size chunks either way, so we can't save more than
~4% and we might easily save nothing at all. As you say, the potential
savings are larger as the values get smaller, because a higher
proportion of the TOAST table pages are not quite full. So I think the
only cases where it's even interesting to think about suppressing this
optimization are those where the value is quite small -- and maybe the
thing to do is just pick a conservatively large threshold and call it
good.

For example, say that instead of applying this technique when there
are at least 2 TOAST chunks, we apply it when there are at least 500
(i.e. normally 1MB). It's hard for me to imagine that we can ever
lose, and in fact I think we could come down quite a bit from there
and still end up winning pretty much all the time. Setting the
threshold to, say, 50 or 100 or 150 TOAST chunks instead of 2 may
leave some money on the table, but if it means that we don't have
meaningful downsides and we don't have tunables for users to fiddle
with, it might be worth it.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Robert Haas 2022-03-16 14:14:56 Re: Corruption during WAL replay
Previous Message Masahiko Sawada 2022-03-16 14:07:51 Re: Logical replication timeout problem