Re: Commitfest 2021-11 Patch Triage - Part 2

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Stephen Frost <sfrost(at)snowman(dot)net>
Cc: Daniel Gustafsson <daniel(at)yesql(dot)se>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Commitfest 2021-11 Patch Triage - Part 2
Date: 2021-11-15 20:23:17
Message-ID: CA+Tgmob7tqjUDcy9ZhPw=kczB=Gj-WxU4G2B0ARcUmJnC_rT2g@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Mon, Nov 15, 2021 at 2:51 PM Stephen Frost <sfrost(at)snowman(dot)net> wrote:
> I get that just compressing the entire stream is simpler and easier and
> such, but it's surely cheaper and more efficient to not decompress and
> then recompress data that's already compressed. Finding a way to pass
> through data that's already compressed when stored as-is while also
> supporting compression of everything else (in a sensible way- wouldn't
> make sense to just compress each attribute independently since a 4 byte
> integer isn't going to get smaller with compression) definitely
> complicates the overall idea but perhaps would be possible to do.

To me, this feels like an attempt to move the goalposts far enough to
kill the project. Sure, in a perfect world, that would be nice. But,
we don't do it anywhere else. If you try to store a JPEG into a bytea
column, we'll try to compress it just like we would any other data,
and it may not work out. If you then take a pg_basebackup of the
database using -Z, there's no attempt made to avoid the overhead of
CPU overhead of compressing those TOAST table pages that contain
already-compressed data and not the others. And it's easy to
understand why that's the case: when you insert data into the
database, there's no way for the database to magically know whether
that data has been previously compressed by some means, and if so, how
effectively. And when you back up a database, the backup doesn't know
which relfilenodes contain TOAST tables or which pages of those
relfilenodes contain that is already pre-compressed. In both cases,
your options are either (1) shut off compression yourself or (2) hope
that the compressor doesn't waste too much effort on it.

I think the same approach ought to be completely acceptable here. I
don't even really understand how we could do anything else. printtup()
just gets datums, and it has no idea whether or how they are toasted.
It calls the type output functions which don't know that data is being
prepared for transmission to the client as opposed to some other
hypothetical way you could call that function, nor do they know what
compression method the client wants. It does not seem at all
straightforward to teach them that ... and even if they did, what
then? It's not like every column value is sent as a separate packet;
the whole row is a single protocol message, and some columns may be
compressed and others uncompressed. Trying to guess what to do about
that seems to boil down to a sheer guess. Unless you try to compress
that mixture of compressed and uncompressed values - and it's
moderately uncommon for every column of a table to be even be
toastable - you aren't going to know how well it will compress. You
could easily waste more CPU cycles trying to guess than you would have
spent just doing what the user asked for.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Justin Pryzby 2021-11-15 20:23:52 Re: Schema variables - new implementation for Postgres 15
Previous Message Dagfinn Ilmari Mannsåker 2021-11-15 20:19:43 Re: Test::More version