Re: Extending amcheck to check toast size and compression

From: Greg Stark <stark(at)mit(dot)edu>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Heikki Linnakangas <hlinnaka(at)iki(dot)fi>, Aleksander Alekseev <aleksander(at)timescale(dot)com>, PostgreSQL Hackers <pgsql-hackers(at)lists(dot)postgresql(dot)org>
Subject: Re: Extending amcheck to check toast size and compression
Date: 2021-10-19 20:58:47
Message-ID: CAM-w4HNAk+U=Lo0mxj6MwW7HB+T=NCgDv+HRWZS_aUsbwVCzDQ@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

Right so here's a review.

I think the patch is committable as is. It's an improvement and it
does the job as promised. I do have some comments but I don't think
they're serious issues and would actually be pretty happy committing
it as is. Fwiw I didn't realize how short the patch was at first and
it probably doesn't need yet another review.

/* Toasted attributes too large to be untoasted should never be stored */
if (toast_pointer.va_rawsize > VARLENA_SIZE_LIMIT)

1) I know this used to say MaxAlloc -- personally I would probably go
with that but either is fine. But if you're going to have a separate
constant there could be more of a comment explaining why that's the
maximum -- probably with a pointer to MaxAlloc and postgres.h's
VARSIZE macros.

The switch statement at line 1443 seems a bit ... baroque. Is it
clearer than a simple "if cmid != TOAST_PGLZ_COMPRESSION_ID && cmid !=
TOAST_LZ4_COMPRESSION_ID)" ? I mean, I see this is easier to add more
cases to but I found dealing with a case that falls through and no
default is a lot of cognitive overhead to understand what is in the
end just effectively a simple branch.

Fwiw compilers aren't always the best at optimizing switch statements.
It's entirely possible a compiler may end up building a whole lookup
table of jumps for this thing. Not that it's performance critical but
...

But all that's more words than necessary for a minor style comment.

Fwiw I spent a few minutes thinking about and writing up this
suggestion and then only afterwards realized the code in question
wasn't from this patch. I'll mention it anyways but it's not relevant
to this patch review I guess :)

I found the whole expected_chunk_seq parameter thing a bit confusing
and less useful than possible. I would instead suggestion:

Allocate an array of the expected number of chunk numbers before
calling check_toast_tuple and then just gather the chunk_seq that are
returned. When it's finished you can do things like: a) Check if
they're all ascending and report index corruption if not. b) Check if
any numbers are missing and report which ones are missing and/or how
many. c) Check if there are duplicates and report that. These would
all be easier for a user to interpret than "index scan returned chunk
5 when expecting chunk 9".

On Tue, 4 May 2021 at 12:20, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
> Hackers,
>
> During the version 14 development period, a few checks of toasted attributes were written but never committed. For the version 15 development cycle, I'd like to consider extending the checks of toasted attributes. First, no toasted attribute should ever have a rawsize larger than the 1GB varlena limit. Second, no compressed toasted attribute should have an extsize indicating that the toast expanded during toasting. Such a extsize could mean the compression code is malfunctioning, or that the extsize or rawsize fields are corrupt. Third, any compressed attribute should have a valid compression method ID.
>
> These checks are cheap. Actually retrieving the compressed toasted data and checking that it uncompresses correctly would have very different performance implications, but that is not included in this patch.
>
>
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>

--
greg

On Wed, 14 Jul 2021 at 10:58, Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com> wrote:
>
>
>
> > On Jul 14, 2021, at 3:33 AM, Heikki Linnakangas <hlinnaka(at)iki(dot)fi> wrote:
> >
> >> +/* The largest valid toast va_rawsize */
> >> +#define VARLENA_SIZE_LIMIT 0x3FFFFFFF
> >> +
> >
> > Hmm, a toasted datum cannot be larger than MaxAllocSize, because it's reconstituted in a palloc'd datum, right?
>
> No datum size exceeds MaxAllocSize, and no datum expands when compressed (because for those that do expand under any particular compression algorithm, we opt to instead store the datum uncompressed), so no valid toast pointer should contain a va_rawsize field greater than MaxAllocSize. Any toast pointers that have larger va_rawsize fields are therefore corrupt.
>
> VARLENA_SIZE_LIMIT is defined here equal to MaxAllocSize:
>
> src/include/utils/memutils.h:#define MaxAllocSize ((Size) 0x3fffffff) /* 1 gigabyte - 1 */
>
> Earlier versions of the patch used MaxAllocSize rather than defining VARLENA_SIZE_LIMIT, but review comments suggested that was less clear.
>
> —
> Mark Dilger
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
>
>
>
>
>

--
greg

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Bossart, Nathan 2021-10-19 21:01:13 Re: Inconsistent behavior of pg_dump/pg_restore on DEFAULT PRIVILEGES
Previous Message Bossart, Nathan 2021-10-19 20:44:35 Re: ALTER INDEX .. RENAME allows to rename tables/views as well