Re: pg_amcheck contrib application

From: Robert Haas <robertmhaas(at)gmail(dot)com>
To: Mark Dilger <mark(dot)dilger(at)enterprisedb(dot)com>
Cc: Tom Lane <tgl(at)sss(dot)pgh(dot)pa(dot)us>, Noah Misch <noah(at)leadboat(dot)com>, Peter Geoghegan <pg(at)bowt(dot)ie>, Thomas Munro <thomas(dot)munro(at)gmail(dot)com>, "Andrey M(dot) Borodin" <x4mmm(at)yandex-team(dot)ru>, Stephen Frost <sfrost(at)snowman(dot)net>, Michael Paquier <michael(at)paquier(dot)xyz>, Amul Sul <sulamul(at)gmail(dot)com>, Dilip Kumar <dilipbalaut(at)gmail(dot)com>, Andres Freund <andres(at)anarazel(dot)de>, PostgreSQL-development <pgsql-hackers(at)postgresql(dot)org>
Subject: Re: pg_amcheck contrib application
Date: 2021-04-01 20:08:56
Message-ID: CA+TgmobpeiMh7Y+vve7eRv8Rru=7FJ_Z2nM8EMxew+sWBr58Og@mail.gmail.com
Views: Raw Message | Whole Thread | Download mbox | Resend email
Thread:
Lists: pgsql-hackers

On Thu, Apr 1, 2021 at 1:41 PM Robert Haas <robertmhaas(at)gmail(dot)com> wrote:
> OK, committed. We still need to deal with what you had as 0003
> upthread, so I guess the next step is for me to spend some time
> reviewing that one.

I did this, and it was a bit depressing. It appears that we now have
duplicate checks for xmin and xmax being out of the valid range.
Somehow you have the removal of those duplicate checks in 0003, but
why in the world didn't you put them into one of the previous patches
and just make 0003 about fixing the
holding-buffer-lock-while-following-TOAST-pointers problem? (And, gah,
why did I not look carefully enough to notice that you hadn't done
that?)

Other than that, I notice a few other things:

- There are a total of two (2) calls in the current source code to
palloc0fast, and hundreds of calls to palloc0. So I think you should
forget about using the fast variant and just do what almost every
other caller does.

- If you want to make this code faster, a better idea would be to
avoid doing all of this allocate and free work and just allocate an
array that's guaranteed to be big enough, and then keep track of how
many elements of that array are actually in use.

- #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
introducing such a feature. Either we do it for real and expose it via
SQL and pg_amcheck as an optional behavior, or we rip it out and
revisit it later. Given the nearness of feature freeze, my vote is for
the latter.

- I'd remove the USE_LZ4 bit, too. Let's not define the presence of
LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
people will expect to be able to use this to find places where they
are dependent on LZ4 if they want to move away from it -- and if we
don't recurse into composite datums, that will not actually work.

- check_toast_tuple() has an odd and slightly unclear calling
convention for which there are no comments. I wonder if it would be
better to reverse things and make bool *error the return value and
what is now the return value into a pointer argument, but whatever we
do I think it needs a few words in the comments. We don't need to
slavishly explain every argument -- I think toasttup and ctx and tctx
are reasonably clear -- but this is not.

- To me it would be more logical to reverse the order of the
toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
- VARHDRSZ checks. Whether we're pointing at the correct relation
feels more fundamental.

- If we moved the toplevel foreach loop in check_toasted_attributes()
out to the caller, say renaming the function to just
check_toasted_attribute(), we'd save a level of indentation in that
whole function and probably add a tad of clarity, too. You wouldn't
feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
if the caller had just done list_free(ctx->toasted_attributes);
ctx->toasted_attributes = NIL.

- Is there a reason we need a cross-check on both the number of chunks
and on the total size? It seems to me that we should check that each
individual chunk has the size we expect, and that the total number of
chunks is what we expect. The overall size is then necessarily
correct.

- Why are all the changes to the tests in this patch? What do they
have to do with getting the TOAST checks out from under the buffer
lock? I really need you to structure the patch series so that each
patch is about one topic and, equally, so that each topic is only
covered by one patch. Otherwise it's just way too confusing.

- I think some of these messages need a bit of word-smithing, too, but
we can leave that for when we're closer to being done with this.

--
Robert Haas
EDB: http://www.enterprisedb.com

In response to

Responses

Browse pgsql-hackers by date

  From Date Subject
Next Message Heikki Linnakangas 2021-04-01 20:47:37 Re: Force lookahead in COPY FROM parsing
Previous Message Stephen Frost 2021-04-01 20:00:06 Re: New predefined roles- 'pg_read/write_all_data'